简体   繁体   English

在python中快速制作一个大的json文件

[英]Making a large json file quickly available in python

I have a multiple-json file of about 5 GB on which I do some exploratory data analysis.我有一个大约 5 GB 的多 json 文件,我在其中进行了一些探索性数据分析。 The problem is that each time I load the file it takes about 1 minute to load it when using this code:问题是每次我加载文件时,使用此代码时加载它大约需要 1 分钟:

with open(json_fn, 'r') as f:   # multiple jsons in one file! (one per line)
   for line in f:
       data = json.loads(line)

Is there a more efficient way to store these data for loading it in python?有没有更有效的方法来存储这些数据以在 python 中加载它? I was thinking about pickle (as it is a binary format which is usually faster) but it seems to be even slower .我在考虑泡菜(因为它是一种二进制格式,通常速度更快)但它似乎更慢 Any recommendations what I could use to avoid waiting 1 minute every time?有什么建议可以用来避免每次等待 1 分钟吗?

You can use ijson for this purpose. ijson ,您可以使用ijson ijson allows reading the file lazily as a stream. ijson 允许将文件作为流懒惰地读取。

import ijson

json_data = ijson.parse(open(FILE_PATH, 'r'))

for prefix, event, value in json_data:
    print(value)

Refer this .参考这个

NOTE: With the help of backends mentioned in this post you can increase the performance quiet a lot.注意:在这篇 文章中提到的后端的帮助下,您可以大大提高性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM