pyspark dataframe 有效 json

Question

Im trying to convert a dataframe to a valid json format, howeever I have not succeeded yet.我正在尝试将 dataframe 转换为有效的 json 格式，但我还没有成功。

if I do like this:如果我这样做：

fullDataset.repartition(1).write.json(f'{mount_point}/eds_ckan', mode='overwrite', ignoreNullFields=False)

I only get row based json like this:我只得到基于行的 json，如下所示：

{"col1":"2021-10-09T12:00:00.000Z","col2":336,"col3":0.0}
{"col1":"2021-10-16T20:00:00.000Z","col2":779,"col3":6965.396}
{"col1":"2021-10-17T12:00:00.000Z","col2":350,"col3":0.0}

Does anyone know how to convert it to valid json which is not row based?有谁知道如何将其转换为不是基于行的有效 json？

Answer 1

Below is the sample example on converting dataframe to valid Json下面是将 dataframe 转换为有效 Json 的示例

Try using Collect and then using json.dump尝试使用Collect ，然后使用json.dump

import json
collected_df = df_final.collect()
with open(data_output_file + 'createjson.json', 'w') as outfile:
    json.dump(data, outfile)

here are few links with the related discussions you can go through for complete information.这里有一些相关讨论的链接，您可以通过 go 获取完整信息。

Dataframe to valid JSON Dataframe 有效 JSON

Valid JSON in spark在 spark 中有效 JSON

pyspark dataframe 有效 json

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-03-01 13:31:02

pyspark dataframe 有效 json

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-03-01 13:31:02

解决方案1
2 已采纳 2022-03-01 13:31:02