繁体   English   中英

如何使用python从文本文件中创建字典

[英]How to make a dictionary from a text file with python

我的文件看起来像这样:

aaien 12 13 39
aan 10
aanbad 12 13 14 57 58 38
aanbaden 12 13 14 57 58 38
aanbeden 12 13 14 57 58 38
aanbid  12 13 14 57 58 39
aanbidden 12 13 14 57 58 39
aanbidt 12 13 14 57 58 39
aanblik 27 28
aanbreken 39
...

我想用key = the word(比如'aaien')创建一个字典,值应该是它旁边的数字列表。 所以它必须这样看:{'aaien':['12,13,39'],'aan':['10']}

这段代码似乎不起作用。

document = open('LIWC_words.txt', 'r')
liwcwords = document.read()
dictliwc = {}
for line in liwcwords:
    k, v = line.strip().split(' ')
    answer[k.strip()] = v.strip()

liwcwords.close()

python给出了这个错误:

ValueError: need more than 1 value to unpack

您将您的行拆分为单词列表,但只给它一个键和值。

这将有效:

with open('LIWC_words.txt', 'r') as document:
    answer = {}
    for line in document:
        line = line.split()
        if not line:  # empty line?
            continue
        answer[line[0]] = line[1:]

请注意,您不需要给.split()一个参数; 没有参数,它们都会在空格上分割并为您删除结果 这可以节省你必须显式调用.strip()

另一种方法是仅在第一个空格上拆分:

with open('LIWC_words.txt', 'r') as document:
    answer = {}
    for line in document:
        if line.strip():  # non-empty line?
            key, value = line.split(None, 1)  # None means 'all whitespace', the default
            answer[key] = value.split()

.split()的第二个参数限制了分割的数量,保证最多返回2个元素,从而可以将赋值中的值解包为keyvalue

这两种方法都会导致:

{'aaien': ['12', '13', '39'],
 'aan': ['10'],
 'aanbad': ['12', '13', '14', '57', '58', '38'],
 'aanbaden': ['12', '13', '14', '57', '58', '38'],
 'aanbeden': ['12', '13', '14', '57', '58', '38'],
 'aanbid': ['12', '13', '14', '57', '58', '39'],
 'aanbidden': ['12', '13', '14', '57', '58', '39'],
 'aanbidt': ['12', '13', '14', '57', '58', '39'],
 'aanblik': ['27', '28'],
 'aanbreken': ['39']}

如果您仍然只看到一个键和文件的其余部分作为(拆分)值,则输入文件可能使用非标准行分隔符。 通过将U字符添加到模式,打开具有通用行结束支持的文件:

with open('LIWC_words.txt', 'rU') as document:
>liwcwords = document.read()  
>dictliwc = {}    
>for line in liwcwords:

你在这里迭代一个字符串,这不是你想要的。 试试document.readlines() 这是另一种解决方案。

from pprint import pprint
with open('LIWC_words.txt') as fd:
    d = {}
    for i in fd:
        entry = i.split()
        if entry: d.update({entry[0]: entry[1:]})

pprint(d)

这是输出的样子

{'aaien': ['12', '13', '39'],
 'aan': ['10'],
 'aanbad': ['12', '13', '14', '57', '58', '38'],
 'aanbaden': ['12', '13', '14', '57', '58', '38'],
 'aanbeden': ['12', '13', '14', '57', '58', '38'],
 'aanbid': ['12', '13', '14', '57', '58', '39'],
 'aanbidden': ['12', '13', '14', '57', '58', '39'],
 'aanbidt': ['12', '13', '14', '57', '58', '39'],
 'aanblik': ['27', '28'],
 'aanbreken': ['39']}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM