繁体   English   中英

从一个文件夹中的多个文件加载 json 并将它们放入一个字典(或列表)中

[英]Loading json from multiple files in one folder and putting them into one dicitionary (or list)

我有多个文本文件,里面有这个(具有不同的值):

{"abandon": {"R1F2V0CQAYJOZZ": 2, "R3NUFWQ9SPGVJO": 1}, "abduct": {"R1F2V0CQAYJOZZ": 1, "R3376OQSHCTV1A": 1, "R14BW4EQZNVKKG": 1, "R233CMES8RCOCU": 1},

如果我在线格式化它会变成这样:

   "abandon":{
      "R1F2V0CQAYJOZZ":2,
      "R3NUFWQ9SPGVJO":1
   },
   "abduct":{
      "R1F2V0CQAYJOZZ":1,
      "R3376OQSHCTV1A":1,
      "R14BW4EQZNVKKG":1,
      "R233CMES8RCOCU":1
   },

这个 JSON 的意思是:

"word":{
   "Document name":"value"
},

但是在不同的文件中有重复的单词。 我想要做的是:读取所有文件并将所有内容存储在一本字典中,但是:

  1. 如果字典中存在“单词”,则检查“文档”是否存在;
  2. 如果“文档存在”,则增加“值”,否则将文档放在那里并且“值= 1”
  3. 如果“word”不存在,则存储“word”、“document”和“value = 1”

编辑:

所以想象一下这两个文件:

File1.txt = {"abandon": {"doc1": 2, "doc2": 1}, "abduct": {"doc1": 1, "doc2": 1, "doc8": 1},

File1.txt = {"abandon": {"doc1": 1, "doc3": 1}, "abduct": {"doc5": 1, "doc8": 1},

我希望我的字典以这样的方式结束:

{"abandon": {"doc1": 3, "doc2": 1, "doc3": 1}, "abduct": {"doc1": 1, "doc2": 1, "doc5": 1, "doc8": 2},

EDIT2:它也可以是嵌套列表

IIUC,尝试:

import os
import json

files = [f for f in os.listdir() if f.endswith(".txt")]
result = dict()

for file in files:
    d = json.load(open(file))
    for word in d:
        if word not in result:
            result[word] = dict()
        for doc in d[word]:
            if doc not in result[word]:
                result[word][doc] = d[word][doc]
            else:
                result[word][doc] += d[word][doc]

>>> result
{'abandon': {'doc1': 3, 'doc2': 1, 'doc3': 1},
 'abduct': {'doc1': 1, 'doc2': 1, 'doc8': 2, 'doc5': 1}}

输入文件:

文件1.txt:

{"abandon": {"doc1": 2, "doc2": 1}, "abduct": {"doc1": 1, "doc2": 1, "doc8": 1}}

文件2.txt:

{"abandon": {"doc1": 1, "doc3": 1}, "abduct": {"doc5": 1, "doc8": 1}}
import json

files = ["input", "list", "of", "files"]
outDict = {}
for file in files:  # iterate over the files
    with open(file) as fn:
        newDict = json.load(fn)
    for word in newDict:  # iterate over each word from the file
        inWord = newDict[word]
        outWord = outDict.get(word, {})  # returns an empty dict if word isn't already in the output dictionary
        for docName in inWord:  # iterate over each document name from the file
            value = outWord.get(docName, 0)  # returns 0 if the document name isn't already in the output dictionary
            value += 1  # increment the value
            outWord[docName] = value  # update the output dictionary

使用.setdefault进行合并很.setdefault

import json
import glob

merged = {}

for file in glob.glob('*.txt'):  # match *.txt files in current directory

    with open(file) as f:
        in_dict = json.load(f)

        for word, docs in in_dict.items():
            for doc, value in docs.items():
                merged.setdefault(word,{})       # create word with empty dict value if it doesn't exist
                merged[word].setdefault(doc, 0)  # create value of 0 for document if it doesn't exist
                merged[word][doc] += value       # add the doc's value.

print(json.dumps(merged,indent=2))

或者使用defaultdict defaultdict 的参数必须是返回默认值的函数,因此 lambda 返回默认整数字典:

import json
import glob
from collections import defaultdict

merged = defaultdict(lambda: defaultdict(int))

for file in glob.glob('*.txt'):  # match *.txt files in current directory

    with open(file) as f:
        in_dict = json.load(f)

        for word, docs in in_dict.items():
            for doc,value in docs.items():
                merged[word][doc] += value       # add the doc's value.

print(json.dumps(merged,indent=2))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM