[英]How to dynamically create a struct column from a list of column names?
I have a dataframe with 100's of columns:我有一个 dataframe 有 100 列:
root
|-- column1
|-- column2
|-- column3
|-- column4
|-- column5
I have a list of the column names:我有一个列名列表:
struct_list = ['column4','column3','column2'] struct_list = ['column4','column3','column2']
Expected Schema:预期架构:
root
|-- column1
|-- column2
|-- column3
|-- column4
|-- column5
|-- prev_val
|-- column4
|-- column3
|-- column2
Currently I am hardcoding the values like:目前我正在硬编码以下值:
df=df.withColumn("prev_val",f.struct(f.col("column4"),f.col("column3"),f.col("column2"))
Is there a way we can dynamically pass the values from the list?有没有一种方法可以动态传递列表中的值?
You can use a list comprehension:您可以使用列表推导:
import pyspark.sql.functions as f
struct_list = ['column4','column3','column2']
df2 = df.withColumn(
"prev_val",
f.struct(*[f.col(c) for c in struct_list])
)
And actually you don't even need f.col
.实际上你甚至不需要
f.col
。 You can just pass the column names directly:您可以直接传递列名:
df2 = df.withColumn(
"prev_val",
f.struct(*struct_list)
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.