如何使用 pyspark 展平列表中的嵌套 json

Question

I have multiple nested jsons in list list=[nestes_json1,nested_json2,nested_json3]我在列表 list=[nestes_json1,nested_json2,nested_json3] 中有多个嵌套 json

I need to take each nested_json and remove comma at the end and process it and load into single dataframe.我需要获取每个nested_json并在最后删除逗号并处理它并加载到单个dataframe中。 how to do it using pyspark.如何使用 pyspark 做到这一点。

I'm able to flatten nested_json individually but not in the form of list.我可以单独展平 nested_json，但不能以列表的形式展平。

Answer 1

lets suppose you are having this type of table假设你有这种类型的桌子

Then you can try the explode function那你可以试试爆function

from pyspark.sql.functions import explode
df.select(df.name,explode(df.subjects)).show(truncateenter =False)

and then you can get the following results然后你可以得到以下结果

After this you can explode that new table again and you will get the exact formatting在此之后，您可以再次分解该新表，您将获得准确的格式

如何使用 pyspark 展平列表中的嵌套 json

问题描述

1 个解决方案

解决方案1
0 2022-09-26 12:30:41

如何使用 pyspark 展平列表中的嵌套 json

问题描述

1 个解决方案

解决方案1 0 2022-09-26 12:30:41

解决方案1
0 2022-09-26 12:30:41