根據 ID 將嵌套的 JSON 文件拆分為兩個 JSON？

Question

我嵌套了 JSON 文件，它作為 python 字典加載，稱為movies_data ，如下所示：

with open('project_folder/data_movie_absa.json') as infile:
  movies_data = json.load(infile)

它具有以下結構：

{ "review_1": {"tokens": ["Best", "show", "ever", "!"], 
               "movie_user_4": {"aspects": ["O", "B_A", "O", "O"], "sentiments": ["B_S", "O", "O", "O"]},  
               "movie_user_6": {"aspects": ["O", "B_A", "O", "O"], "sentiments": ["B_S", "O", "O", "O"]}}, 

  "review_2": {"tokens": ["Its", "a", "great", "show"], 
               "movie_user_1": {"aspects": ["O", "O", "O", "B_A"], "sentiments": ["O", "O", "B_S", "O"]}, 
               "movie_user_6": {"aspects": ["O", "O", "O", "B_A"], "sentiments": ["O", "O", "B_S", "O"]}},

  "review_3": {"tokens": ["I", "love", "this", "actor", "!"],  
               "movie_user_17": {"aspects": ["O", "O", "O", "B_A", "O"], "sentiments": ["O", "B_S", "O", "O", "O"]}, 
               "movie_user_23": {"aspects": ["O", "O", "O", "B_A", "O"], "sentiments": ["O", "B_S", "O", "O", "O"]}},

  "review_4": {"tokens": ["Bad", "movie"], 
               "movie_user_1": {"aspects": ["O", "B_A"], "sentiments": ["B_S", "O"]}, 
               "movie_user_6": {"aspects": ["O", "B_A"], "sentiments": ["B_S", "O"]}}

...
}

它有 3324 個鍵值對（即最多鍵 review_3224）。 我想根據特定的鍵列表將此文件拆分為兩個 json 文件（ train_movies.json 、 test_movies.json ）：

test_IDS = ['review_2', 'review_4']

with open("train_movies.json", "w", encoding="utf-8-sig") as outfile_train, open("test_movies.json", "w", encoding="utf-8-sig") as outfile_test:
  for review_id, review in movies_data.items():
    if review_id in test_IDS:
      outfile = outfile_test
      outfile.write('{"%s": "%s"}' % (review_id, movies_data[review_id]))
      
    else:
      outfile = outfile_train
      outfile.write('{"%s": "%s"}' % (review_id, movies_data[review_id]))
  outfile.close()

對於 test_movies.json 我有以下結構：

{"review_2": "{'tokens': ['Its', 'a', 'great', 'show'], 
            'movie_user_4': {'aspects': ['O', 'O', 'O', 'B_A'], 'sentiments': ['O', 'O', 'B_S', 'O']}, 
            'movie_user_6': {'aspects': ['O', 'O', 'O', 'B_A'], 'sentiments': ['O', 'O', 'B_S', 'O']}}"}

{"review_4": "{'tokens': ['Bad', 'movie'], 
               'movie_user_1': {'aspects': ['O', 'B_A'], 'sentiments': ['B_S', 'O']},
               'movie_user_6': {'aspects': ['O', 'B_A'], 'sentiments': ['B_S', 'O']}}"}

不幸的是，這個結構有一些問題，比如不一致的雙引號（ " vs. ' ），評論之間沒有逗號等......因此，通過將test_movies.json讀取為json文件，我test_movies.json了以下問題：

with open('project_folder/test_movies.json') as infile:
  testing_data = json.load(infile)

錯誤信息：


JSONDecodeError                           Traceback (most recent call last)
<ipython-input-10-3548a718f421> in <module>()
      1 with open('/content/gdrive/My Drive/project_folder/test_movies.json') as infile:
----> 2   testing_data = json.load(infile)

1 frames
/usr/lib/python3.6/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    342         if s.startswith('\ufeff'):
    343             raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",
--> 344                                   s, 0)
    345     else:
    346         if not isinstance(s, (bytes, bytearray)):

JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)

所需的輸出應該有一個正確的 json 結構，就像原始的movies_data那樣，python 可以將它作為 dict 正確讀取。

你能幫我更正我的python代碼嗎？

提前謝謝你！

Answer 1

問題

需要使用 json.dumps 創建輸出字符串以寫入文件。
使用 Python 字符串格式即 '{"%s": "%s"}' % (review_id, movies_data[review_id]) 會產生您描述的問題

代碼

train, test = {}, {}   # Dicionaries for storing training and test data
for review_id, review in movies_data.items():
    if review_id in test_IDS:
        test[review_id] = review
    else:
        train[review_id] = review

# Output Test
with open("test_movies.json", "w") as outfile_test:
    json.dump(test, outfile_test)
    
# Output training
with open("train_movies.json", "w") as outfile_train:
    json.dump(train, outfile_train)

結果

輸入： test.json 的文件內容

{ "review_1": {"tokens": ["Best", "show", "ever", "!"], 
               "movie_user_4": {"aspects": ["O", "B_A", "O", "O"], "sentiments": ["B_S", "O", "O", "O"]},  
               "movie_user_6": {"aspects": ["O", "B_A", "O", "O"], "sentiments": ["B_S", "O", "O", "O"]}}, 

  "review_2": {"tokens": ["Its", "a", "great", "show"], 
               "movie_user_1": {"aspects": ["O", "O", "O", "B_A"], "sentiments": ["O", "O", "B_S", "O"]}, 
               "movie_user_6": {"aspects": ["O", "O", "O", "B_A"], "sentiments": ["O", "O", "B_S", "O"]}},

  "review_3": {"tokens": ["I", "love", "this", "actor", "!"],  
               "movie_user_17": {"aspects": ["O", "O", "O", "B_A", "O"], "sentiments": ["O", "B_S", "O", "O", "O"]}, 
               "movie_user_23": {"aspects": ["O", "O", "O", "B_A", "O"], "sentiments": ["O", "B_S", "O", "O", "O"]}},

  "review_4": {"tokens": ["Bad", "movie"], 
               "movie_user_1": {"aspects": ["O", "B_A"], "sentiments": ["B_S", "O"]}, 
               "movie_user_6": {"aspects": ["O", "B_A"], "sentiments": ["B_S", "O"]}}

}

輸出： test_movies.json 的文件內容

{"review_2": {"tokens": ["Its", "a", "great", "show"], "movie_user_1": {"aspects": ["O", "O", "O", "B_A"], "sentiments": ["O", "O", "B_S", "O"]}, "movie_user_6": {"aspects": ["O", "O", "O", "B_A"], "sentiments": ["O", "O", "B_S", "O"]}}, "review_4": {"tokens": ["Bad", "movie"], "movie_user_1": {"aspects": ["O", "B_A"], "sentiments": ["B_S", "O"]}, "movie_user_6": {"aspects": ["O", "B_A"], "sentiments": ["B_S", "O"]}}}

輸出： train_movies.json 的文件內容

{"review_1": {"tokens": ["Best", "show", "ever", "!"], "movie_user_4": {"aspects": ["O", "B_A", "O", "O"], "sentiments": ["B_S", "O", "O", "O"]}, "movie_user_6": {"aspects": ["O", "B_A", "O", "O"], "sentiments": ["B_S", "O", "O", "O"]}}, "review_3": {"tokens": ["I", "love", "this", "actor", "!"], "movie_user_17": {"aspects": ["O", "O", "O", "B_A", "O"], "sentiments": ["O", "B_S", "O", "O", "O"]}, "movie_user_23": {"aspects": ["O", "O", "O", "B_A", "O"], "sentiments": ["O", "B_S", "O", "O", "O"]}}}

根據 ID 將嵌套的 JSON 文件拆分為兩個 JSON？

問題描述

1 個解決方案

解決方案1
0 已采納 2021-01-28 00:36:57

根據 ID 將嵌套的 JSON 文件拆分為兩個 JSON？

問題描述

1 個解決方案

解決方案1 0 已采納 2021-01-28 00:36:57

解決方案1
0 已采納 2021-01-28 00:36:57