繁体   English   中英

如何从 txt 文件中读取值是嵌套字典并且嵌套字典的键由我给出的字典?

[英]How to read from a txt file into a dictionary where the value is a nested dictionary and the key of the nested dictionary is given by me?

我正在尝试将以下 txt 文件加载到字典中:
['A'] (4)
['B'] (4)
['E'] (4)
['C'] (4)
['A', 'B'] (3)
['A', 'E'] (3)
['A', 'C'] (3)
['B', 'E'] (4)
['B', 'C'] (3)
['C', 'E'] (3)
['A','B','E'] (3)
['B', 'C', 'E'] (3)

我希望字典看起来像这样:

itemsets={ A:{"support_count":4},B:{"support_count":4},E:{"support_count":4},C:{"support_count":4},AB:{"support_count":3},AE:{"support_count":3},AC:{"support_count":3},BE:{"support_count":4},BC:{"support_count":3},CE:{"support_count":3},ABE:{"support_count":3},BCE:{support_count:3}}

这是我到目前为止所拥有的:

    keys=[]
    values=[]

    with open(filename, 'r') as f:
        lines = f.readlines()
    keys = [line[:line.find(']')] for line in lines]
    keys = [k.replace('[', '').replace(']', '').replace(',','').replace("'",'').replace(' ','') for k in keys]
    
    values= [line[line.find('('):] for line in lines]
    values = [v.replace('(', '').replace(')', '').replace("'",'').replace("\n",'') for v in values]
    itemsets = dict.fromkeys(keys)
    for v in values:
        for item in itemsets.keys():
            d[item]={"support_count": v}
    return itemsets
这是我运行它时得到的:

{'A': {'support_count': '3'}, 'B': {'support_count': '3'}, 'E': {'support_count': '3'}, 'C': {'support_count': '3'}, 'AB': {'support_count': '3'}, 'AE': {'support_count': '3'}, 'AC': {'support_count': '3'}, 'BE': {'support_count': '3'}, 'BC': {'support_count': '3'}, 'CE': {'support_count': '3'}, 'ABE': {'support_count': '3'}, 'BCE': {'support_count': '3'}}

当您迭代values时,您会继续用下一个覆盖 dict 值,最后一个是3 ,您需要同时迭代这两个值: zip

for k, v in zip(keys, values):
    d[k] = {"support_count": int(v)}

要解析数据,我建议使用正则表达式方法

  • \[(.*)] \((\d+)解析每一行:键和值
  • [^AZ]从键中删除非字母
import re

lines = ["['A'] (4)", "['B'] (4)", "['E'] (4)", "['C'] (4)", "['A', 'B'] (3)",
         "['A', 'E'] (3)", "['A', 'C'] (3)", "['B','E'] (4)", "['B', 'C'] (3)",
         "['C', 'E'] (3)", "['A','B', 'E'] (3)", "['B', 'C', 'E'] (3)"]

d = {}
ptn_all = re.compile(r"\[(.*)] \((\d+)")
ptn_key = re.compile("[^A-Z]")
for line in lines:
    keys, value = ptn_all.search(line).groups()
    d[ptn_key.sub("", keys)] = {"support_count": int(value)}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM