简体   繁体   中英

How to read multiline json-like file with multiple JSON fragments separated by just a new line?

I have a json file with multiple json objects (each object can be a multiple line json) Example:

{"date": "2022-11-29", "runs": [{"23597": 821260}, {"23617": 821699}]}
{"date": "2022-11-30", "runs": [{"23597": 821269}, {"23617": 8213534}]}

Note that indeed this is not valid JSON as whole file (and hence regular "read JSON in Python" code fails, expected), but each individual "fragment" is complete and valid JSON. It sounds like file was produced by some logging tool that simply appends the next block as text to the file.

As expected, regular way of reading that I have tried with the below snippet fails:

with open('run_log.json','r') as file:
    d = json.load(file)
    print(d)

Produces expected error about invalid JSON:

JSONDecodeError: Extra data: line 3 column 1 (char 89)

How can I solve this, possibly using the json module? Ideally, I want to read the json file and get the runs list for only a particular date (Ex: 2022-11-30), but just being able to read all entries would be enough.

NDJSON , not JSON.

It's a valid file format and often confused for JSON.

Python of course has a library for this.

import ndjson

with open('run_log.json','r') as file:
    d = ndjson.load(file)
    for elem in d:
        print(type(elem), elem)

output

<class 'dict'> {'date': '2022-11-29', 'runs': [{'23597': 821260}, {'23617': 821699}]}
<class 'dict'> {'date': '2022-11-30', 'runs': [{'23597': 821269}, {'23617': 8213534}]}

Each line is valid JSON (See JSON Lines format ) and it makes a nice format as a logger since a file can append new JSON lines without read/modify/write of the whole file as JSON would require.

You can use json.loads() to parse it a line at a time.

Given run_log.json:

{"date": "2022-11-29", "runs": [{"23597": 821260}, {"23617": 821699}]}
{"date": "2022-11-30", "runs": [{"23597": 821269}, {"23617": 8213534}]}

Use:

import json

with open('run_log.json', encoding='utf8') as file:
    for line in file:
        data = json.loads(line)
        print(data)

Output:

{'date': '2022-11-29', 'runs': [{'23597': 821260}, {'23617': 821699}]}
{'date': '2022-11-30', 'runs': [{'23597': 821269}, {'23617': 8213534}]}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM