[英]Pivot Spark Dataframe Columns to Rows with Wildcard column Names in PySpark
I am trying to pivot a spark Dataframe with columns which have a foreign key to another table.我正在尝试使用具有指向另一个表的外键的列旋转 spark Dataframe。 All such column name start with FK_<column_name>
.所有这些列名都以FK_<column_name>
开头。 The number of such columns can be 1 or more.这种列的数量可以是 1 个或更多。
I want to be able to pivot all those columns where column name starts with FK_
into rows in one column so I can join with the other table.我希望能够将列名以FK_
开头的所有列FK_
为一列中的行,以便我可以加入另一个表。 I don't need the column Names in another column but if the pivot operation does it that is fine as well.我不需要另一列中的列名称,但如果数据透视操作这样做也很好。
Example table I have我有示例表
id name dept FK_column1 FK_column2 FK_Column3
1 Alpha ABC 101 102 103
2 Bravo CDE 104 105 106
output I am looking for我正在寻找的输出
id name dept foreign_keys
1 Alpha ABC 101
1 Alpha ABC 102
1 Alpha ABC 103
2 Bravo CDE 104
2 Bravo CDE 105
2 Bravo CDE 106
You can get the list of columns whose name starts with FK_
and build stack
expression to unpivot the dataframe:您可以获取名称以FK_
开头的列的列表,并构建stack
表达式以对数据FK_
进行反FK_
:
fk_cols = [c for c in df.columns if c.startswith("FK_")]
stack_expr = f"stack({len(fk_cols)}," + ','.join(
[f"'{c.replace('FK_', '')}',{c}" for c in fk_cols]
) + ") as (FK, foreign_keys)"
df.selectExpr("id", "name", "dept", stack_expr).show()
#+---+-----+----+-------+------------+
#| id| name|dept| FK|foreign_keys|
#+---+-----+----+-------+------------+
#| 1|Alpha| ABC|column1| 101|
#| 1|Alpha| ABC|column2| 102|
#| 1|Alpha| ABC|Column3| 103|
#| 2|Bravo| CDE|column1| 104|
#| 2|Bravo| CDE|column2| 105|
#| 2|Bravo| CDE|Column3| 106|
#+---+-----+----+-------+------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.