how to pivot /transpose rows of a column in to individual columns in spark-scala without using the pivot method

Question

Please check below image for the reference to my use case

Answer 1

Use groupBy , pivot & agg functions. Check below code. Added inline comments.

scala> df.show(false)
+----------+------+----+
|tdate     |ttype |tamt|
+----------+------+----+
|2020-10-15|draft |5000|
|2020-10-18|cheque|7000|
+----------+------+----+

scala> df
.groupBy($"tdate") // Grouping data based on tdate column.
.pivot("ttype",Seq("cheque","draft")) // pivot based on ttype and "draft","cheque" are new column name
.agg(first("tamt")) // aggregation by "tamt" column.
.show(false)

+----------+------+-----+
|tdate     |cheque|draft|
+----------+------+-----+
|2020-10-18|7000  |null |
|2020-10-15|null  |5000 |
+----------+------+-----+

Answer 2

You can get the same result without using pivot by adding the columns manually, if you know all the names of the new columns:

import org.apache.spark.sql.functions.{col, when}

dataframe
  .withColumn("cheque", when(col("ttype") === "cheque", col("tamt")))
  .withColumn("draft", when(col("ttype") === "draft", col("tamt")))
  .drop("tamt", "ttype")

As this solution does not trigger shuffle, your processing will be faster than using pivot.

It can be generalized if you don't know the name of the columns. However, in this case you should benchmark to check whether pivot is more performant:

import org.apache.spark.sql.functions.{col, when}

val newColumnNames = dataframe.select("ttype").distinct.collect().map(_.getString(0))

newColumnNames
  .foldLeft(dataframe)((df, columnName) => {
    df.withColumn(columnName, when(col("ttype") === columnName, col("tamt")))
  })
  .drop("tamt", "ttype")

how to pivot /transpose rows of a column in to individual columns in spark-scala without using the pivot method

Question

2 answers

solution1
0 2020-11-20 13:04:50

solution2
0 ACCPTED 2020-11-22 09:33:48

how to pivot /transpose rows of a column in to individual columns in spark-scala without using the pivot method

Question

2 answers

solution1 0 2020-11-20 13:04:50

solution2 0 ACCPTED 2020-11-22 09:33:48

solution1
0 2020-11-20 13:04:50

solution2
0 ACCPTED 2020-11-22 09:33:48