[英]Making a large json file quickly available in python
I have a multiple-json file of about 5 GB on which I do some exploratory data analysis.我有一个大约 5 GB 的多 json 文件,我在其中进行了一些探索性数据分析。 The problem is that each time I load the file it takes about 1 minute to load it when using this code:
问题是每次我加载文件时,使用此代码时加载它大约需要 1 分钟:
with open(json_fn, 'r') as f: # multiple jsons in one file! (one per line)
for line in f:
data = json.loads(line)
Is there a more efficient way to store these data for loading it in python?有没有更有效的方法来存储这些数据以在 python 中加载它? I was thinking about pickle (as it is a binary format which is usually faster) but it seems to be even slower .
我在考虑泡菜(因为它是一种二进制格式,通常速度更快)但它似乎更慢。 Any recommendations what I could use to avoid waiting 1 minute every time?
有什么建议可以用来避免每次等待 1 分钟吗?
You can use ijson
for this purpose. ijson
,您可以使用ijson
。 ijson allows reading the file lazily as a stream. ijson 允许将文件作为流懒惰地读取。
import ijson
json_data = ijson.parse(open(FILE_PATH, 'r'))
for prefix, event, value in json_data:
print(value)
NOTE: With the help of backends mentioned in this post you can increase the performance quiet a lot.注意:在这篇 文章中提到的后端的帮助下,您可以大大提高性能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.