SPARK-IQ-3

Explain below transformations respective to Dataframe?
Narrow Dependency Transformation
Wide Dependency Transformation
Narrow Dependency TransformationWide Dependency Transformation
Narrow Dependency Transformation : A transformation performed independently on a single partition to produce valid results.Wide Dependency Transformation:A transformation that requires data from other partitions to produce correct results.GrouBy,Join,OrderBy are example of Wide Dependency Transformation.
Df3 = Df2.filter(Df2.Gender == ‘Male’)Df3 = Df2.groupBy(country)
How to read data from column?

from pyspark.sql.functions import *
airlinesDF.select(column(”Origin”), col(’Dest”), airlinesDF.Distance,"year" ).show(1O)

  • we can use column,col object ,double quotes within column.
  • Consider we have one table,in that there are 3 columns.Day,month,year. Create one table with one column having DATE as in format YYYY-MM-DD.
using expr function

a1r1nesDF.setect(”0r1g1n, Dest’, “Distance”, expr(”to_date(concat(Year,Month,DayofMonth),’yyyyMMdd) as FlightDate”)) .show(1)I
origin dest distance flightdate
1	san	sf0	4471	1987-10-14


a1r1nesDF.setect(”0r1g1n, Dest’, “Distance”, to_date(concat("Year","Month","DayofMonth"),’yyyyMMdd').alias("FlightDate”)) .show(1)I
origin dest distance flightdate
1	san	sf0	4471	1987-10-14

Convert year in proper format like 81 -> 1981, 11->2011
+—–+—+—–+—-+———–+
| name|day|month|year| id|
+—–+—+—–+—-+———–+
|vikas| 23| 5| 81| 0|
| Ravi| 28| 1|2002| 8589934592|
|vikas| 23| 5| 81| 8589934593|
| John| 12| 12| 6|17179869184|
| kamal| 7| 8| 63|17179869185|
+—–+—+—–+—-+———–+



df3 = df1.withColumn("year", expr("""
         case when year < 21 then cast(year as int) + 2000
         when year < 100 then cast(year as int) + 1900
         else year
         end"""))
df3.show()



+-----+---+-----+----+-----------+
| name|day|month|year|         id|
+-----+---+-----+----+-----------+
|vikas| 23|    5|1981|          0|
| Ravi| 28|    1|2002| 8589934592|
|vikas| 23|    5|1981| 8589934593|
| John| 12|   12|2006|17179869184|
| kamal|  7|    8|1963|17179869185|
+-----+---+-----+----+-----------+



OR

df4 = df1.withColumn("year", expr("""
         case when year < 21 then year + 2000
         when year < 100 then year + 1900
         else year
         end""").cast(IntegerType()))
df4.show()


+-----+---+-----+----+-----------+
| name|day|month|year|         id|
+-----+---+-----+----+-----------+
|vikas| 23|    5|1981|          0|
| Ravi| 28|    1|2002| 8589934592|
|vikas| 23|    5|1981| 8589934593|
| John| 12|   12|2006|17179869184|
| kamal|  7|    8|1963|17179869185|
+-----+---+-----+----+-----------+

OR

df5 = df1.withColumn("day", col("day").cast(IntegerType())) \
         .withColumn("month", col("month").cast(IntegerType())) \
         .withColumn("year", col("year").cast(IntegerType())) 

df6 = df5.withColumn("year", expr("""
          case when year < 21 then year + 2000
          when year < 100 then year + 1900
          else year
          end"""))
df6.show()

+-----+---+-----+----+-----------+
| name|day|month|year|         id|
+-----+---+-----+----+-----------+
|vikas| 23|    5|1981|          0|
| Ravi| 28|    1|2002| 8589934592|
|vikas| 23|    5|1981| 8589934593|
| John| 12|   12|2006|17179869184|
| kamal|  7|    8|1963|17179869185|
+-----+---+-----+----+-----------+

OR

df7 = df5.withColumn("year", \
                    when(col("year") < 21, col("year") + 2000) \
                    .when(col("year") < 100, col("year") + 1900) \
                    .otherwise(col("year")))
df7.show()

+-----+---+-----+----+-----------+
| name|day|month|year|         id|
+-----+---+-----+----+-----------+
|vikas| 23|    5|1981|          0|
| Ravi| 28|    1|2002| 8589934592|
|vikas| 23|    5|1981| 8589934593|
| John| 12|   12|2006|17179869184|
| kamal|  7|    8|1963|17179869185|
+-----+---+-----+----+-----------+
Add DOB column at the end of table.

df8 = df7.withColumn(“dob”, expr(“to_date(concat(day,’/’,month,’/’,year), ‘d/M/y’)”))
df8.show()

+—–+—+—–+—-+———–+———-+
| name|day|month|year| id| dob|
+—–+—+—–+—-+———–+———-+
|vikas| 23| 5|1981| 0|1981-05-23|
| Ravi| 28| 1|2002| 8589934592|2002-01-28|
|vikas| 23| 5|1981| 8589934593|1981-05-23|
| John| 12| 12|2006|17179869184|2006-12-12|
| kamal| 7| 8|1963|17179869185|1963-08-07|
+—–+—+—–+—-+———–+———-+