Python：Json.load大型json文件MemoryError

Question

我正在嘗試加載一個較大的JSON文件（300MB）以用於解析為excel。 當我執行json.load（file）時，我剛開始遇到MemoryError。 與此類似的問題已經發布，但無法回答我的特定問題。 我希望能夠像在代碼中一樣在一個塊中返回json文件中的所有數據。 最好的方法是什么？ 代碼和json結構如下：

代碼看起來像這樣。

def parse_from_file(filename):
    """ proceed to load the json file that given and verified,
    it and returns the data that was in the json file so it can actually be read
    Args: 
        filename (string): full branch location, used to grab the json file plus '_metrics.json'
    Returns: 
        data: whatever data is being loaded from the json file
    """

    print("STARTING PARSE FROM FILE")
    with open(filename) as json_file:    
        d = json.load(json_file)
        json_file.close()
        return d

結構看起來像這樣。

[
    {
        "analysis_type": "test_one",
        "date": 1505900472.25, 
        "_id": "my_id_1.1.1",
        "content": {
            .
            .
            .
        }
    },
    {
        "analysis_type": "test_two",
        "date": 1605939478.91,
        "_id": "my_id_1.1.2",
        "content": {
            .
            .
            .
        }
    },

    .
    .
    .
]

在“內容”中，信息不一致，但是具有3個不同但不同的可能模板，這些模板可以根據analysis_type進行預測。

Answer 1

如果所有經過測試的庫都給您帶來內存問題，那么我的方法是將數組中每個對象的文件拆分為一個。

如果文件中包含換行符和填充符（如您在OP中所述），我應該逐行讀取，如果每次找到},都將[或]將行寫到新文件中，則將其丟棄},並且還需要刪除逗號。 然后，嘗試讀取每個文件並在結束閱讀每個文件時打印一條消息，以查看失敗的地方（如果失敗）。

如果文件沒有換行符或未正確填充，則需要開始通過逐個保留字符計數器來讀取char，在找到[或{時分別增加它們，在找到]或}分別減少它們。 還要考慮到您可能需要丟棄字符串中的任何花括號或方括號，盡管可能不需要。

Answer 2

我確實喜歡這種方式，希望對您有所幫助。 也許您需要跳過第一行“ [”。 並在行尾刪除“，”（如果存在“}，”）。

with open(file) as f:
    for line in f:
        while True:
            try:
                jfile = ujson.loads(line)
                break
            except ValueError:
                # Not yet a complete JSON value
                line += next(f)
        # do something with jfile

Python：Json.load大型json文件MemoryError

問題描述

2 個解決方案

解決方案1
0 2018-01-12 15:29:12

解決方案2
0 已采納 2018-01-12 15:43:35

Python：Json.load大型json文件MemoryError

問題描述

2 個解決方案

解決方案1 0 2018-01-12 15:29:12

解決方案2 0 已采納 2018-01-12 15:43:35

解決方案1
0 2018-01-12 15:29:12

解決方案2
0 已采納 2018-01-12 15:43:35