在python中快速制作一个大的json文件

Question

I have a multiple-json file of about 5 GB on which I do some exploratory data analysis.我有一个大约 5 GB 的多 json 文件，我在其中进行了一些探索性数据分析。 The problem is that each time I load the file it takes about 1 minute to load it when using this code:问题是每次我加载文件时，使用此代码时加载它大约需要 1 分钟：

with open(json_fn, 'r') as f:   # multiple jsons in one file! (one per line)
   for line in f:
       data = json.loads(line)

Is there a more efficient way to store these data for loading it in python?有没有更有效的方法来存储这些数据以在 python 中加载它？ I was thinking about pickle (as it is a binary format which is usually faster) but it seems to be even slower .我在考虑泡菜（因为它是一种二进制格式，通常速度更快）但它似乎更慢。 Any recommendations what I could use to avoid waiting 1 minute every time?有什么建议可以用来避免每次等待 1 分钟吗？

Answer 1

You can use ijson for this purpose. ijson ，您可以使用ijson 。 ijson allows reading the file lazily as a stream. ijson 允许将文件作为流懒惰地读取。

import ijson

json_data = ijson.parse(open(FILE_PATH, 'r'))

for prefix, event, value in json_data:
    print(value)

Refer this .参考这个。

NOTE: With the help of backends mentioned in this post you can increase the performance quiet a lot.注意：在这篇文章中提到的后端的帮助下，您可以大大提高性能。

在python中快速制作一个大的json文件

问题描述

1 个解决方案

解决方案1
0 2020-09-24 12:07:20

在python中快速制作一个大的json文件

问题描述

1 个解决方案

解决方案1 0 2020-09-24 12:07:20

解决方案1
0 2020-09-24 12:07:20