[英]Loading json from multiple files in one folder and putting them into one dicitionary (or list)
我有多个文本文件,里面有这个(具有不同的值):
{"abandon": {"R1F2V0CQAYJOZZ": 2, "R3NUFWQ9SPGVJO": 1}, "abduct": {"R1F2V0CQAYJOZZ": 1, "R3376OQSHCTV1A": 1, "R14BW4EQZNVKKG": 1, "R233CMES8RCOCU": 1},
如果我在线格式化它会变成这样:
"abandon":{
"R1F2V0CQAYJOZZ":2,
"R3NUFWQ9SPGVJO":1
},
"abduct":{
"R1F2V0CQAYJOZZ":1,
"R3376OQSHCTV1A":1,
"R14BW4EQZNVKKG":1,
"R233CMES8RCOCU":1
},
这个 JSON 的意思是:
"word":{
"Document name":"value"
},
但是在不同的文件中有重复的单词。 我想要做的是:读取所有文件并将所有内容存储在一本字典中,但是:
编辑:
所以想象一下这两个文件:
File1.txt = {"abandon": {"doc1": 2, "doc2": 1}, "abduct": {"doc1": 1, "doc2": 1, "doc8": 1},
File1.txt = {"abandon": {"doc1": 1, "doc3": 1}, "abduct": {"doc5": 1, "doc8": 1},
我希望我的字典以这样的方式结束:
{"abandon": {"doc1": 3, "doc2": 1, "doc3": 1}, "abduct": {"doc1": 1, "doc2": 1, "doc5": 1, "doc8": 2},
EDIT2:它也可以是嵌套列表
IIUC,尝试:
import os
import json
files = [f for f in os.listdir() if f.endswith(".txt")]
result = dict()
for file in files:
d = json.load(open(file))
for word in d:
if word not in result:
result[word] = dict()
for doc in d[word]:
if doc not in result[word]:
result[word][doc] = d[word][doc]
else:
result[word][doc] += d[word][doc]
>>> result
{'abandon': {'doc1': 3, 'doc2': 1, 'doc3': 1},
'abduct': {'doc1': 1, 'doc2': 1, 'doc8': 2, 'doc5': 1}}
输入文件:
文件1.txt:
{"abandon": {"doc1": 2, "doc2": 1}, "abduct": {"doc1": 1, "doc2": 1, "doc8": 1}}
文件2.txt:
{"abandon": {"doc1": 1, "doc3": 1}, "abduct": {"doc5": 1, "doc8": 1}}
import json
files = ["input", "list", "of", "files"]
outDict = {}
for file in files: # iterate over the files
with open(file) as fn:
newDict = json.load(fn)
for word in newDict: # iterate over each word from the file
inWord = newDict[word]
outWord = outDict.get(word, {}) # returns an empty dict if word isn't already in the output dictionary
for docName in inWord: # iterate over each document name from the file
value = outWord.get(docName, 0) # returns 0 if the document name isn't already in the output dictionary
value += 1 # increment the value
outWord[docName] = value # update the output dictionary
使用.setdefault
进行合并很.setdefault
:
import json
import glob
merged = {}
for file in glob.glob('*.txt'): # match *.txt files in current directory
with open(file) as f:
in_dict = json.load(f)
for word, docs in in_dict.items():
for doc, value in docs.items():
merged.setdefault(word,{}) # create word with empty dict value if it doesn't exist
merged[word].setdefault(doc, 0) # create value of 0 for document if it doesn't exist
merged[word][doc] += value # add the doc's value.
print(json.dumps(merged,indent=2))
或者使用defaultdict
。 defaultdict 的参数必须是返回默认值的函数,因此 lambda 返回默认整数字典:
import json
import glob
from collections import defaultdict
merged = defaultdict(lambda: defaultdict(int))
for file in glob.glob('*.txt'): # match *.txt files in current directory
with open(file) as f:
in_dict = json.load(f)
for word, docs in in_dict.items():
for doc,value in docs.items():
merged[word][doc] += value # add the doc's value.
print(json.dumps(merged,indent=2))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.