繁体   English   中英

Spark DF 列到字符串 JSON

[英]Spark DF column to string JSON

我有一个这样的DF:

+------------+-------------------------------------------------------------+
|pk_attr_name|pk_struct                                                    |
+------------+-------------------------------------------------------------+
|CLNT_GRP_CD |{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"} |
|IDI_CONTRACT|{"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}|
+------------+-------------------------------------------------------------+

我想从 pk_struct 列定义一个 JSON 字符串。 期望的输出:

pk_struct_str = '[{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"},{"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}]'

我试过了:

pk_df.select(F.to_json(F.struct("pk_struct")).alias("json")).show(truncate=False)

但没有给我想要的结果

pk_df.printSchema()
root
 |-- pk_attr_name: string (nullable = true)
 |-- pk_struct: string (nullable = true)

您可以使用 collect_list 或 collect_set 函数来实现此结果。但它可以与聚合函数一起使用。 因此创建了虚拟列并按该列值分组,并且在聚合中使用了 collect_list 函数

df.show(2,False)
df1 = df.withColumn("dummy",lit("XXX"))
df2 = df1.groupBy("dummy").agg(collect_list(df1.pk_struct))
df2.show(2,False)

    
+------------+-------------------------------------------------------------+
|pk_attr_name|pk_struct                                                    |
+------------+-------------------------------------------------------------+
|CLNT_GRP_CD |{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"} |
|IDI_CONTRACT|{"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}|
+------------+-------------------------------------------------------------+

+-----+-----------------------------------------------------------------------------------------------------------------------------+
|dummy|collect_list(pk_struct)                                                                                                      |
+-----+-----------------------------------------------------------------------------------------------------------------------------+
|XXX  |[{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"}, {"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}]|
+-----+-----------------------------------------------------------------------------------------------------------------------------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM