简体   繁体   中英

Python iterating through large json file

So ive recently been trying to learn how to use django by creating a website that enables the user to create magic the gathering decks. This question is more related to just python though and not the django framework itself.

Anyway im trying to parse a huge json file that contains around 200 sets for MTG and each set has multiple cards and then that card has multiple types as you can also see from the image below, So its a fairly complex data structure.

在此处输入图片说明

Now the current whay im parsing all the data is using for loops like this:

def InsertFormats():
    json_object = setJson()
    for sets in json_object:
        for cards in json_object[sets]['cards']:
            if 'legalities' in cards:
                for legalities in cards['legalities']:
                    cardFormat = legalities['format']
                    legalType = legalities['legality']
                    obj, created = CardFormat.objects.get_or_create(cardFormat=cardFormat)
                    obj, created = LegalTypes.objects.get_or_create(legalType=legalType)

But the issue with this is that it will just randomly time out with this error

Process finished with exit code -1073741819

which im only assuming is occuring due to the amount of itterations this function is making. I have multiple function like this to insert the data from the json object to my database.

Is there any other way to iterate through a large json object with out needing to go through so many for loops just to reach the data i need or atleast so it wont crash?

It's a memory allocation error. Python dicts aren't good allocating memory with large datasets.

You can try out other container datatypes. As namedtuples (lighter than dict):

from collections import namedtuple
import json

with open(yourfile) as f:
    json_object = json.load(
        f,
        object_hook = lambda x: namedtuple('JsonObject', x.keys())(**x)
    )

Or tries: http://bugs.python.org/issue9520

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM