[英]How to do I convert array of list column to struct in pyspark
I need to create a json string of non null columns.我需要创建一个非空列的 json 字符串。 I created a new column col 3 as list or array with names of non null columns for each row.
我创建了一个新列col 3作为列表或数组,每行的名称都是非空列。 How do I create struct/json string using the column col3
如何使用列col3创建 struct/json 字符串
post code![]() |
addr![]() |
col3![]() |
---|---|---|
5678 ![]() |
[post code] ![]() |
|
7877 ![]() |
San jose![]() |
[post code, addr] ![]() |
Output expected预期输出
col4![]() |
---|
{"post code": "5678"} ![]() |
{"post code": "7877", "addr": "San jose"} ![]() |
as the column is not iterable I created a string by replacing col3 values with col(names) for every element of aray and applied struct but it gave all array elements as 1 value.由于该列不可迭代,因此我通过将col3值替换为col(names)的每个元素的 aray 和应用结构创建了一个字符串,但它为所有数组元素提供了 1 个值。
As an alternate I created col4 using作为替代方案,我使用创建了col4
to_json(struct(when (col1 not null,col1),when (col2 not null,col2)))
However wanted to know if we can do by using only col3但是想知道我们是否可以只使用col3
df1 = df.withColumn("New", F.to_json(F.struct(F.col('post code'),F.col('addr'))))
df1.show(2,False)
+---------+--------+--------------------------------------+
|post code|addr |New |
+---------+--------+--------------------------------------+
|5678 |null |{"post code":"5678"} |
|7877 |San jose|{"post code":"7877","addr":"San jose"}|
+---------+--------+--------------------------------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.