简体   繁体   English

在 PySpark DataFrame 中添加多个空列

[英]Adding Multiple Empty Columns in PySpark DataFrame

Can anyone suggest how can I add multiple empty columns in a pyspark dataframe.谁能建议我如何在 pyspark 数据框中添加多个空列。 Currently I am doing something like this but its not working :目前我正在做这样的事情,但它不起作用:

def add_columns(dataframe, column_list):
    for col in column_list:
        self = dataframe.withColumn(str(col), lit(None).cast(StringType())))
    return dataframe

In the output schema after the add_columns function is applied , I get new column as generator object genexpr at 0x7f41189d7e10: string (nullable = true)在应用 add_columns 函数后的输出模式中,我在 0x7f41189d7e10: string (nullable = true) 处获得新列作为生成器对象geneexpr

Your code snippet is working for me, just make this small change inside:你的代码片段对我有用,只需在里面做这个小改动:

def add_columns(dataframe, column_list):
    self = dataframe.withColumn(str(column_list[0]), f.lit(None).cast(StringType()))
    for col in column_list[1:]:
        self = self.withColumn(str(col), f.lit(None).cast(StringType()))
    return self

I returned "self" instead of "dataframe" to not adding multiple columns to dataframe every time the function is run.我返回“self”而不是“dataframe”,以便在每次运行函数时不向数据帧添加多列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM