简体   繁体   English

如何读取多个 JSON 片段仅由换行符分隔的多行 json 类文件?

[英]How to read multiline json-like file with multiple JSON fragments separated by just a new line?

I have a json file with multiple json objects (each object can be a multiple line json) Example:我有一个 json 文件,其中包含多个 json 对象(每个 object 可以是多行 json)示例:

{"date": "2022-11-29", "runs": [{"23597": 821260}, {"23617": 821699}]}
{"date": "2022-11-30", "runs": [{"23597": 821269}, {"23617": 8213534}]}

Note that indeed this is not valid JSON as whole file (and hence regular "read JSON in Python" code fails, expected), but each individual "fragment" is complete and valid JSON. It sounds like file was produced by some logging tool that simply appends the next block as text to the file.请注意,这确实不是有效的 JSON 作为整个文件(因此常规的“在 Python 中读取 JSON”代码失败,预计),但每个单独的“片段”都是完整且有效的 JSON。听起来文件是由某些日志记录工具生成的只需将下一个块作为文本附加到文件中。

As expected, regular way of reading that I have tried with the below snippet fails:正如预期的那样,我尝试使用以下代码片段进行的常规阅读方式失败了:

with open('run_log.json','r') as file:
    d = json.load(file)
    print(d)

Produces expected error about invalid JSON:产生关于无效 JSON 的预期错误:

JSONDecodeError: Extra data: line 3 column 1 (char 89) JSONDecodeError:额外数据:第 3 行第 1 列(字符 89)

How can I solve this, possibly using the json module?我该如何解决这个问题,可能使用 json 模块? Ideally, I want to read the json file and get the runs list for only a particular date (Ex: 2022-11-30), but just being able to read all entries would be enough.理想情况下,我想读取 json 文件并获取特定日期(例如:2022-11-30)的运行列表,但仅能够读取所有条目就足够了。

NDJSON , not JSON. NDJSON ,而不是 JSON。

It's a valid file format and often confused for JSON.这是一种有效的文件格式,经常与 JSON 混淆。

Python of course has a library for this. Python 当然有一个图书馆。

import ndjson

with open('run_log.json','r') as file:
    d = ndjson.load(file)
    for elem in d:
        print(type(elem), elem)

output output

<class 'dict'> {'date': '2022-11-29', 'runs': [{'23597': 821260}, {'23617': 821699}]}
<class 'dict'> {'date': '2022-11-30', 'runs': [{'23597': 821269}, {'23617': 8213534}]}

Each line is valid JSON (See JSON Lines format ) and it makes a nice format as a logger since a file can append new JSON lines without read/modify/write of the whole file as JSON would require.每行都是有效的 JSON(请参阅JSON 行格式),它是一种很好的记录器格式,因为文件可以 append 新的 JSON 行,而无需像 JSON 那样读取/修改/写入整个文件。

You can use json.loads() to parse it a line at a time.您可以使用json.loads()一次解析一行。

Given run_log.json:鉴于 run_log.json:

{"date": "2022-11-29", "runs": [{"23597": 821260}, {"23617": 821699}]}
{"date": "2022-11-30", "runs": [{"23597": 821269}, {"23617": 8213534}]}

Use:采用:

import json

with open('run_log.json', encoding='utf8') as file:
    for line in file:
        data = json.loads(line)
        print(data)

Output: Output:

{'date': '2022-11-29', 'runs': [{'23597': 821260}, {'23617': 821699}]}
{'date': '2022-11-30', 'runs': [{'23597': 821269}, {'23617': 8213534}]}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM