简体   繁体   English

如何将列表列的数组转换为 pyspark 中的结构

[英]How to do I convert array of list column to struct in pyspark

I need to create a json string of non null columns.我需要创建一个非空列的 json 字符串。 I created a new column col 3 as list or array with names of non null columns for each row.我创建了一个新列col 3作为列表或数组,每行的名称都是非空列。 How do I create struct/json string using the column col3如何使用列col3创建 struct/json 字符串

post code邮政编码 addr地址 col3列3
5678 5678 [post code] [邮政编码]
7877 7877 San jose圣荷西 [post code, addr] [邮政编码,地址]

Output expected预期输出

col4列4
{"post code": "5678"} {“邮政编码”:“5678”}
{"post code": "7877", "addr": "San jose"} {“邮政编码”:“7877”,“地址”:“圣何塞”}

as the column is not iterable I created a string by replacing col3 values with col(names) for every element of aray and applied struct but it gave all array elements as 1 value.由于该列不可迭代,因此我通过将col3值替换为col(names)的每个元素的 aray 和应用结构创建了一个字符串,但它为所有数组元素提供了 1 个值。

As an alternate I created col4 using作为替代方案,我使用创建了col4

to_json(struct(when (col1 not null,col1),when (col2 not null,col2)))

However wanted to know if we can do by using only col3但是想知道我们是否可以只使用col3

df1 = df.withColumn("New", F.to_json(F.struct(F.col('post code'),F.col('addr'))))
df1.show(2,False)

+---------+--------+--------------------------------------+
|post code|addr    |New                                   |
+---------+--------+--------------------------------------+
|5678     |null    |{"post code":"5678"}                  |
|7877     |San jose|{"post code":"7877","addr":"San jose"}|
+---------+--------+--------------------------------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM