简体   繁体   中英

Python: Json.load large json file MemoryError

I'm trying to load a large JSON File (300MB) to use to parse to excel. I just started running into a MemoryError when I do a json.load(file). Questions similar to this have been posted but have not been able to answer my specific question. I want to be able to return all the data from the json file in one block like I did in the code. What is the best way to do that? The Code and json structure are below:

The code looks like this.

def parse_from_file(filename):
    """ proceed to load the json file that given and verified,
    it and returns the data that was in the json file so it can actually be read
    Args: 
        filename (string): full branch location, used to grab the json file plus '_metrics.json'
    Returns: 
        data: whatever data is being loaded from the json file
    """

    print("STARTING PARSE FROM FILE")
    with open(filename) as json_file:    
        d = json.load(json_file)
        json_file.close()
        return d

The structure looks like this.

[
    {
        "analysis_type": "test_one",
        "date": 1505900472.25, 
        "_id": "my_id_1.1.1",
        "content": {
            .
            .
            .
        }
    },
    {
        "analysis_type": "test_two",
        "date": 1605939478.91,
        "_id": "my_id_1.1.2",
        "content": {
            .
            .
            .
        }
    },

    .
    .
    .
]

Inside "content" the information is not consistent but has 3 distinct but different possible template that can be predicted based of analysis_type.

If all the tested libraries are giving you memory problems my approach would be splitting the file into one per each object inside the array.

If the file has the newlines and padding as you said in the OP I owuld read by line, discarding if it is [ or ] writting the lines to new files every time you find a }, where you also need to remove the commas. Then try to load everyfile and print a message when you end reading each one to see where it fails, if it does.

If the file has no newlines or is not properly padded you would need to start reading char by char keeping too counters, increasing each of them when you find [ or { and decreasing them when you find ] or } respectively. Also take into account that you may need to discard any curly or square bracket that is inside a string, though that may not be needed.

i did like this way, hope it will helps you. and maybe you need skip the 1th line "[". and remove "," at a line end if exists "},".

with open(file) as f:
    for line in f:
        while True:
            try:
                jfile = ujson.loads(line)
                break
            except ValueError:
                # Not yet a complete JSON value
                line += next(f)
        # do something with jfile

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM