简体   繁体   English

如何解析包含多个对象的单行json文件

[英]How to parse a single line json file containing multiple objects

I need to read some JSON data for processing. 我需要读取一些JSON数据进行处理。 I have a single line file that has multiple JSON objects how can I parse this? 我有一个包含多个JSON对象的单行文件,该如何解析呢?

I want the output to be a file with a single line per object. 我希望输出为每个对象一行的文件。

I have tried a brute force method that will use json.loads recursively to check if the json is valid but I'm getting different results every time I run the program 我尝试了一种蛮力方法,该方法将递归使用json.loads来检查json是否有效,但是每次运行程序时都会得到不同的结果

import json

with open('sample.json') as inp:
s = inp.read()

jsons = []

start, end = s.find('{'), s.find('}')
while True:
 try:
    jsons.append(json.loads(s[start:end + 1]))
    print(jsons)
except ValueError:
    end = end + 1 + s[end + 1:].find('}')
else:
    s = s[end + 1:]
    if not s:
        break
    start, end = s.find('{'), s.find('}')

for x  in jsons:
  writeToFilee(x)

The json format can be seen here https://pastebin.com/DgbyjAG9 json格式可以在这里https://pastebin.com/DgbyjAG9

why not just use the pos attribute of the JSONDecodeError to tell you where to delimit things? 为什么不只使用JSONDecodeErrorpos属性来告诉您在哪里定界呢?

something like: 就像是:

import json

def json_load_all(buf):
    while True:
        try:
            yield json.loads(buf)
        except json.JSONDecodeError as err:
            yield json.loads(buf[:err.pos])
            buf = buf[err.pos:]
        else:
            break

works with your demo data as: 与您的演示数据一起使用:

with open('data.json') as fd:
    arr = list(json_load_all(fd.read()))

gives me exactly two elements, but I presume you have more? 给了我两个元素,但我想您还有更多?

to complete this using the standard library, writing out would look something like: 要使用标准库完成此操作,写出内容如下所示:

with open('data.json') as inp, open('out.json', 'w') as out:
    for obj in json_load_all(inp.read()):
        json.dump(obj, out)
        print(file=out)

otherwise the jsonlines package is good for dealing with this data format 否则jsonlines包非常适合处理这种数据格式

The code below worked for me: 下面的代码为我工作:

import json
with open(input_file_path) as f_in: 
    file_data = f_in.read() 
    file_data = file_data.replace("}{", "},{") 
    file_data = "[" + file_data + "]"
    data = json.loads(file_data)

Following @Chris A 's comment, I've prepared this snippet which should work just fine: @Chris A的评论之后,我准备了这个片段,应该可以正常工作:

with open('my_jsons.file') as file:
    json_string = file.read()

json_objects = re.sub('}\s*{', '}|!|{', json_string).split('|!|')
# replace |!| with whatever suits you best

for json_object in json_objects:
    print(json.loads(obj))

This example, however, will become worthless as soon as '}{' string appears in some value inside your JSON, so I strongly recommend using @Sam Mason 's solution 但是,一旦JSON中的'} {'字符串出现在某个值中,此示例将变得一文不值,因此,我强烈建议使用@Sam Mason的解决方案

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM