从一个文件夹中的多个文件加载 json 并将它们放入一个字典（或列表）中

Question

我有多个文本文件，里面有这个（具有不同的值）：

{"abandon": {"R1F2V0CQAYJOZZ": 2, "R3NUFWQ9SPGVJO": 1}, "abduct": {"R1F2V0CQAYJOZZ": 1, "R3376OQSHCTV1A": 1, "R14BW4EQZNVKKG": 1, "R233CMES8RCOCU": 1},

如果我在线格式化它会变成这样：

   "abandon":{
      "R1F2V0CQAYJOZZ":2,
      "R3NUFWQ9SPGVJO":1
   },
   "abduct":{
      "R1F2V0CQAYJOZZ":1,
      "R3376OQSHCTV1A":1,
      "R14BW4EQZNVKKG":1,
      "R233CMES8RCOCU":1
   },

这个 JSON 的意思是：

"word":{
   "Document name":"value"
},

但是在不同的文件中有重复的单词。 我想要做的是：读取所有文件并将所有内容存储在一本字典中，但是：

如果字典中存在“单词”，则检查“文档”是否存在；
如果“文档存在”，则增加“值”，否则将文档放在那里并且“值= 1”
如果“word”不存在，则存储“word”、“document”和“value = 1”

编辑：

所以想象一下这两个文件：

File1.txt = {"abandon": {"doc1": 2, "doc2": 1}, "abduct": {"doc1": 1, "doc2": 1, "doc8": 1},

File1.txt = {"abandon": {"doc1": 1, "doc3": 1}, "abduct": {"doc5": 1, "doc8": 1},

我希望我的字典以这样的方式结束：

{"abandon": {"doc1": 3, "doc2": 1, "doc3": 1}, "abduct": {"doc1": 1, "doc2": 1, "doc5": 1, "doc8": 2},

EDIT2：它也可以是嵌套列表

Answer 1

IIUC，尝试：

import os
import json

files = [f for f in os.listdir() if f.endswith(".txt")]
result = dict()

for file in files:
    d = json.load(open(file))
    for word in d:
        if word not in result:
            result[word] = dict()
        for doc in d[word]:
            if doc not in result[word]:
                result[word][doc] = d[word][doc]
            else:
                result[word][doc] += d[word][doc]

>>> result
{'abandon': {'doc1': 3, 'doc2': 1, 'doc3': 1},
 'abduct': {'doc1': 1, 'doc2': 1, 'doc8': 2, 'doc5': 1}}

输入文件：

文件1.txt：

{"abandon": {"doc1": 2, "doc2": 1}, "abduct": {"doc1": 1, "doc2": 1, "doc8": 1}}

文件2.txt：

{"abandon": {"doc1": 1, "doc3": 1}, "abduct": {"doc5": 1, "doc8": 1}}

Answer 2

import json

files = ["input", "list", "of", "files"]
outDict = {}
for file in files:  # iterate over the files
    with open(file) as fn:
        newDict = json.load(fn)
    for word in newDict:  # iterate over each word from the file
        inWord = newDict[word]
        outWord = outDict.get(word, {})  # returns an empty dict if word isn't already in the output dictionary
        for docName in inWord:  # iterate over each document name from the file
            value = outWord.get(docName, 0)  # returns 0 if the document name isn't already in the output dictionary
            value += 1  # increment the value
            outWord[docName] = value  # update the output dictionary

Answer 3

使用.setdefault进行合并很.setdefault ：

import json
import glob

merged = {}

for file in glob.glob('*.txt'):  # match *.txt files in current directory

    with open(file) as f:
        in_dict = json.load(f)

        for word, docs in in_dict.items():
            for doc, value in docs.items():
                merged.setdefault(word,{})       # create word with empty dict value if it doesn't exist
                merged[word].setdefault(doc, 0)  # create value of 0 for document if it doesn't exist
                merged[word][doc] += value       # add the doc's value.

print(json.dumps(merged,indent=2))

或者使用defaultdict 。 defaultdict 的参数必须是返回默认值的函数，因此 lambda 返回默认整数字典：

import json
import glob
from collections import defaultdict

merged = defaultdict(lambda: defaultdict(int))

for file in glob.glob('*.txt'):  # match *.txt files in current directory

    with open(file) as f:
        in_dict = json.load(f)

        for word, docs in in_dict.items():
            for doc,value in docs.items():
                merged[word][doc] += value       # add the doc's value.

print(json.dumps(merged,indent=2))

从一个文件夹中的多个文件加载 json 并将它们放入一个字典（或列表）中

问题描述

3 个解决方案

解决方案1
2 已采纳 2021-11-03 20:21:41

解决方案2
0 2021-11-03 20:22:25

解决方案3
0 2021-11-03 20:36:16

从一个文件夹中的多个文件加载 json 并将它们放入一个字典（或列表）中

问题描述

3 个解决方案

解决方案1 2 已采纳 2021-11-03 20:21:41

解决方案2 0 2021-11-03 20:22:25

解决方案3 0 2021-11-03 20:36:16

解决方案1
2 已采纳 2021-11-03 20:21:41

解决方案2
0 2021-11-03 20:22:25

解决方案3
0 2021-11-03 20:36:16