[英]Adding Multiple Empty Columns in PySpark DataFrame
Can anyone suggest how can I add multiple empty columns in a pyspark dataframe.谁能建议我如何在 pyspark 数据框中添加多个空列。 Currently I am doing something like this but its not working :
目前我正在做这样的事情,但它不起作用:
def add_columns(dataframe, column_list):
for col in column_list:
self = dataframe.withColumn(str(col), lit(None).cast(StringType())))
return dataframe
In the output schema after the add_columns function is applied , I get new column as generator object genexpr at 0x7f41189d7e10: string (nullable = true)在应用 add_columns 函数后的输出模式中,我在 0x7f41189d7e10: string (nullable = true) 处获得新列作为生成器对象geneexpr
Your code snippet is working for me, just make this small change inside:你的代码片段对我有用,只需在里面做这个小改动:
def add_columns(dataframe, column_list):
self = dataframe.withColumn(str(column_list[0]), f.lit(None).cast(StringType()))
for col in column_list[1:]:
self = self.withColumn(str(col), f.lit(None).cast(StringType()))
return self
I returned "self" instead of "dataframe" to not adding multiple columns to dataframe every time the function is run.我返回“self”而不是“dataframe”,以便在每次运行函数时不向数据帧添加多列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.