[英]How to parse a single line json file containing multiple objects
I need to read some JSON data for processing. 我需要读取一些JSON数据进行处理。 I have a single line file that has multiple JSON objects how can I parse this?
我有一个包含多个JSON对象的单行文件,该如何解析呢?
I want the output to be a file with a single line per object. 我希望输出为每个对象一行的文件。
I have tried a brute force method that will use json.loads recursively to check if the json is valid but I'm getting different results every time I run the program 我尝试了一种蛮力方法,该方法将递归使用json.loads来检查json是否有效,但是每次运行程序时都会得到不同的结果
import json
with open('sample.json') as inp:
s = inp.read()
jsons = []
start, end = s.find('{'), s.find('}')
while True:
try:
jsons.append(json.loads(s[start:end + 1]))
print(jsons)
except ValueError:
end = end + 1 + s[end + 1:].find('}')
else:
s = s[end + 1:]
if not s:
break
start, end = s.find('{'), s.find('}')
for x in jsons:
writeToFilee(x)
The json format can be seen here https://pastebin.com/DgbyjAG9 json格式可以在这里https://pastebin.com/DgbyjAG9
why not just use the pos
attribute of the JSONDecodeError
to tell you where to delimit things? 为什么不只使用
JSONDecodeError
的pos
属性来告诉您在哪里定界呢?
something like: 就像是:
import json
def json_load_all(buf):
while True:
try:
yield json.loads(buf)
except json.JSONDecodeError as err:
yield json.loads(buf[:err.pos])
buf = buf[err.pos:]
else:
break
works with your demo data as: 与您的演示数据一起使用:
with open('data.json') as fd:
arr = list(json_load_all(fd.read()))
gives me exactly two elements, but I presume you have more? 给了我两个元素,但我想您还有更多?
to complete this using the standard library, writing out would look something like: 要使用标准库完成此操作,写出内容如下所示:
with open('data.json') as inp, open('out.json', 'w') as out:
for obj in json_load_all(inp.read()):
json.dump(obj, out)
print(file=out)
otherwise the jsonlines
package is good for dealing with this data format 否则
jsonlines
包非常适合处理这种数据格式
The code below worked for me: 下面的代码为我工作:
import json
with open(input_file_path) as f_in:
file_data = f_in.read()
file_data = file_data.replace("}{", "},{")
file_data = "[" + file_data + "]"
data = json.loads(file_data)
Following @Chris A 's comment, I've prepared this snippet which should work just fine: 在@Chris A的评论之后,我准备了这个片段,应该可以正常工作:
with open('my_jsons.file') as file:
json_string = file.read()
json_objects = re.sub('}\s*{', '}|!|{', json_string).split('|!|')
# replace |!| with whatever suits you best
for json_object in json_objects:
print(json.loads(obj))
This example, however, will become worthless as soon as '}{' string appears in some value inside your JSON, so I strongly recommend using @Sam Mason 's solution 但是,一旦JSON中的'} {'字符串出现在某个值中,此示例将变得一文不值,因此,我强烈建议使用@Sam Mason的解决方案
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.