[英]Spark DF column to string JSON
我有一个这样的DF:
+------------+-------------------------------------------------------------+
|pk_attr_name|pk_struct |
+------------+-------------------------------------------------------------+
|CLNT_GRP_CD |{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"} |
|IDI_CONTRACT|{"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}|
+------------+-------------------------------------------------------------+
我想从 pk_struct 列定义一个 JSON 字符串。 期望的输出:
pk_struct_str = '[{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"},{"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}]'
我试过了:
pk_df.select(F.to_json(F.struct("pk_struct")).alias("json")).show(truncate=False)
但没有给我想要的结果
pk_df.printSchema()
root
|-- pk_attr_name: string (nullable = true)
|-- pk_struct: string (nullable = true)
您可以使用 collect_list 或 collect_set 函数来实现此结果。但它可以与聚合函数一起使用。 因此创建了虚拟列并按该列值分组,并且在聚合中使用了 collect_list 函数
df.show(2,False)
df1 = df.withColumn("dummy",lit("XXX"))
df2 = df1.groupBy("dummy").agg(collect_list(df1.pk_struct))
df2.show(2,False)
+------------+-------------------------------------------------------------+
|pk_attr_name|pk_struct |
+------------+-------------------------------------------------------------+
|CLNT_GRP_CD |{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"} |
|IDI_CONTRACT|{"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}|
+------------+-------------------------------------------------------------+
+-----+-----------------------------------------------------------------------------------------------------------------------------+
|dummy|collect_list(pk_struct) |
+-----+-----------------------------------------------------------------------------------------------------------------------------+
|XXX |[{"pk_seq":1,"pk_attr_id":20209,"pk_attr_name":"CLNT_GRP_CD"}, {"pk_seq":2,"pk_attr_id":45483,"pk_attr_name":"IDI_CONTRACT"}]|
+-----+-----------------------------------------------------------------------------------------------------------------------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.