[英]How to read a JSON file in python as stream in chunks with specific format
I have a huge file ~8 GB in JSON and I want to read it as stream with chunks of 1000 examples at a time.我在 JSON 中有一个约 8 GB 的大文件,我想将其读取为 stream,一次包含 1000 个示例。 So I searched a lot and tried several packages but not of them really did the job.
所以我搜索了很多并尝试了几个包,但没有一个真的能完成这项工作。
The format of my file is as follows:我的文件格式如下:
{
"Elem1": [
{
"orgs": [],
},
{
"people":[]
},
],
"Elem2"":[
{
"orgs": [],
},
{
"people":[]
},
],
...
}
As you can see, each element
is saved as a tuple with two dicts and reoccurring keys in it.如您所见,每个
element
都保存为一个元组,其中包含两个字典和重复出现的键。 Is there a way how I could read/load/process the file above in chunks of elements ie chunk_1 = [ Elem1, Elem2, ... ]
into the RAM and get the values for the keys out of them?有没有办法我可以读取/加载/处理上面的文件中的元素块,即
chunk_1 = [ Elem1, Elem2, ... ]
到 RAM 中并从中获取键的值? Any ideas how to do that?任何想法如何做到这一点? Would appreciate your help.
感谢您的帮助。
Best regards Chris最好的问候克里斯
As Serge said, you will need a custom parser to do the job.正如 Serge 所说,您将需要一个自定义解析器来完成这项工作。 Something like below:
如下所示:
stack = []
json_string = ""
count = 0
with open(filename) as f:
while True:
c = f.read(1)
if c == '{' or c == '[':
stack.append(c)
elif c == '}' or c == ']':
stack.pop()
json_string += c
if len(stack) == 1:
json_string += '}'
count += 1
if count == DESIRED_COUNT :
break
The final json_string
will contain the json with DESIRED_COUNT of objects最终的
json_string
将包含 json 和 DESIRED_COUNT 个对象
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.