Spark Scala Dataframe: How to convert columns into rows?

Question

Are there some efficient way to transpose columns into rows for big DataFrame in Spark Scala?

val inputDF = Seq(("100","A", "10", "B", null),
                  ("101","A", "20", "B", 30)
              ).toDF("ID", "Type1", "Value1", "Type2", "Value2")

I want to transpose it into a Dataframe as below.

val OutDF = Seq(("100","A", "10"),
                ("100","B", "null),
                ("101", "A", "20"),
                ("101", "B", "30")
             ).toDF("ID", "TypeID", "Value")

The dataframe is big, which contains around 1GB data. I am using spark 2.4.x. Any comments on doing this in an efficient way? Thanks a lot!

Answer 1

You can do a union:

val outputDF = inputDF.select("ID","Type1","Value1")
                      .unionAll(inputDF.select("ID","Type2","Value2"))
                      .toDF("ID","Type","Value")  // change column names

Spark Scala Dataframe: How to convert columns into rows?

Question

1 answers

solution1
2 ACCPTED 2021-02-03 09:56:14

Spark Scala Dataframe: How to convert columns into rows?

Question

1 answers

solution1 2 ACCPTED 2021-02-03 09:56:14

solution1
2 ACCPTED 2021-02-03 09:56:14