Unpivot dataframe in Pyspark with new column

Question

I would like to unpivot a dataframe that looks like this:

Col1 Col2 Val1 Val2
abc  def  12   75
ghi  jkl  67   86
...  ...  ..   ..

into something that will look like this:

Col1 Col2 NewCol Val
abc  def  KEY1   12
abc  def  KEY2   75
ghi  jkl  KEY1   67
ghi  jkl  KEY2   86
...  ...  ....   ..

I am quite new to python, but I know there is no unpivot function in pyspark.. any idea how I can achieve this? Thanks a lot!

Answer 1

Given the Dataframe you provided, one could use:

from pyspark.sql import functions as F
df.select(
  F.col("Col1"),
  F.col("Col2"),
  F.explode(
    F.map_from_arrays(
      F.array(F.lit("key1"), F.lit("key2")), 
      F.array(F.col("val1"), F.col("val2"))
    )
  )
)

As long as you maintain the order of keys and values, you should be fine

Unpivot dataframe in Pyspark with new column

Question

1 answers

solution1
1 ACCPTED 2022-01-14 12:28:11

Unpivot dataframe in Pyspark with new column

Question

1 answers

solution1 1 ACCPTED 2022-01-14 12:28:11

solution1
1 ACCPTED 2022-01-14 12:28:11