简体   繁体   English

pyspark dataframe 有效 json

[英]pyspark dataframe to valid json

Im trying to convert a dataframe to a valid json format, howeever I have not succeeded yet.我正在尝试将 dataframe 转换为有效的 json 格式,但我还没有成功。

if I do like this:如果我这样做:

fullDataset.repartition(1).write.json(f'{mount_point}/eds_ckan', mode='overwrite', ignoreNullFields=False)

I only get row based json like this:我只得到基于行的 json,如下所示:

{"col1":"2021-10-09T12:00:00.000Z","col2":336,"col3":0.0}
{"col1":"2021-10-16T20:00:00.000Z","col2":779,"col3":6965.396}
{"col1":"2021-10-17T12:00:00.000Z","col2":350,"col3":0.0}

Does anyone know how to convert it to valid json which is not row based?有谁知道如何将其转换为不是基于行的有效 json?

Below is the sample example on converting dataframe to valid Json下面是将 dataframe 转换为有效 Json 的示例

Try using Collect and then using json.dump尝试使用Collect ,然后使用json.dump

import json
collected_df = df_final.collect()
with open(data_output_file + 'createjson.json', 'w') as outfile:
    json.dump(data, outfile)

here are few links with the related discussions you can go through for complete information.这里有一些相关讨论的链接,您可以通过 go 获取完整信息。

Dataframe to valid JSON Dataframe 有效 JSON

Valid JSON in spark在 spark 中有效 JSON

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM