简体   繁体   English

将列列表(变量)连接成一个新列 dataframe pyspark

[英]concatenate list of columns (variable) into one new column dataframe pyspark

I am using pyspark and I have a dataframe df_001 which contain N columns 'rec' and 'id' and 'NAME'.我正在使用 pyspark 并且我有一个 dataframe df_001,其中包含 N 列“rec”和“id”以及“NAME”。

IF I want to add a new column 'unq_id' that will concatenate 'rec' and 'id' for example.例如,如果我想添加一个新列“unq_id”,它将连接“rec”和“id”。 When I do that it works perfectly:当我这样做时,它会完美运行:

df_f_final = df_001.withColumn('unq_id', sf.concat(sf.col('rec'), sf.lit('||'), sf.col('id'))) .

but I need to make the list of column to concatenate dynamique (list for example): How can I do that?但我需要制作列列表以连接动态(例如列表):我该怎么做? for example create list: LL = ['rec', 'id', 'NAME'] or LL = ['rec', 'NAME'] and use that to generate the dataframe df_f_final and concatenate the columns that are in the list LL例如创建列表:LL = ['rec', 'id', 'NAME'] 或 LL = ['rec', 'NAME'] 并使用它来生成 dataframe df_f_final 并连接列表中的列 LL

It is easy i think but it s driving me crazy我认为这很容易,但它让我发疯

Thank you for your help谢谢您的帮助

check this out and let me know if it helps.看看这个,让我知道它是否有帮助。

    #InputDF
    # +------+------+
    # |rec_id|  name|
    # +------+------+
    # |    a1| ricky|
    # |    b1|sachin|
    # +------+------+

    LL = ['rec_id', 'name']


    df1 = df.withColumn("unq_id_value", F.concat( *[F.concat(F.col(col),F.lit("||")) for col in LL]))

    df2 = df1.withColumn("unq_id_value",F.expr("substring(unq_id_value, 1, length(unq_id_value)-2)"))

    df2.show()

    # +------+------+------------+
    # |rec_id|  name|unq_id_value|
    # +------+------+------------+
    # |    a1| ricky|   a1||ricky|
    # |    b1|sachin|  b1||sachin|
    # +------+------+------------+

Thank you Loka for your answer finally i found a solution, it s similar to yours.谢谢Loka的回答,我终于找到了解决方案,它与您的相似。 I did that and it s working我做到了,它正在工作

cols = ['col1', lit('||'), 'col2', lit('||'), 'col3']
unq_id = sf.udf(lambda cols: "".join([x for x in cols]), StringType())
df.withColumn('unqid', unq_id(sf.array(cols))).show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM