简体   繁体   中英

Pivot Spark Dataframe Columns to Rows with Wildcard column Names in PySpark

I am trying to pivot a spark Dataframe with columns which have a foreign key to another table. All such column name start with FK_<column_name> . The number of such columns can be 1 or more.

I want to be able to pivot all those columns where column name starts with FK_ into rows in one column so I can join with the other table. I don't need the column Names in another column but if the pivot operation does it that is fine as well.

Example table I have

id     name     dept      FK_column1      FK_column2     FK_Column3
1     Alpha     ABC       101             102            103
2     Bravo     CDE       104            105           106

output I am looking for

id     name     dept    foreign_keys
1     Alpha     ABC     101
1     Alpha     ABC     102
1     Alpha     ABC     103
2     Bravo     CDE     104
2     Bravo     CDE     105
2     Bravo     CDE     106

You can get the list of columns whose name starts with FK_ and build stack expression to unpivot the dataframe:

fk_cols = [c for c in df.columns if c.startswith("FK_")]

stack_expr = f"stack({len(fk_cols)}," + ','.join(
    [f"'{c.replace('FK_', '')}',{c}" for c in fk_cols]
) + ") as (FK, foreign_keys)"

df.selectExpr("id", "name", "dept", stack_expr).show()

#+---+-----+----+-------+------------+
#| id| name|dept|     FK|foreign_keys|
#+---+-----+----+-------+------------+
#|  1|Alpha| ABC|column1|         101|
#|  1|Alpha| ABC|column2|         102|
#|  1|Alpha| ABC|Column3|         103|
#|  2|Bravo| CDE|column1|         104|
#|  2|Bravo| CDE|column2|         105|
#|  2|Bravo| CDE|Column3|         106|
#+---+-----+----+-------+------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM