简体   繁体   English

多级字典python

[英]multi level dictionaries python

I want to make a multi-level dictionary, which is the count of times the token has appeared with a Pos tag. 我想制作一个多级字典,这是令牌与Pos标签一起出现的次数。

example: 例:

cat/nt     
cat = token. nt = pos tag 

I have gotten up to here but I'm stuck. 我已经到了这里,但是我被卡住了。

import re 

dicts = {}

wds = re.compile('(\w*|\w+\.\w*)([/])(\w+)')

with open('train.txt', 'r') as td:

for lines in td:

m =  wds.finditer(lines)

for mms in m:

dicts[mms.group(1)] = mms.group(3)

content of train.txt file train.txt文件的内容

Pierre/NNP Vinken/NNP ,/, 61/CD years/NNS old/JJ ,/, will/MD join/VB

Try something like this in your inner for loop: 在内部for循环中尝试以下操作:

for mms in m:
    token = mms.group(1)
    pos = mms.group(3)
    if token in dicts:
        if pos in dicts[token]:
            dicts[token][pos] += 1
        else:
            dicts[token][pos] = 1
    else:
        dicts[token] = {pos: 1}

This checks if we've seen the token at all before, and given that we have, checks if we've seen it as this part of speech before. 这将检查我们之前是否曾经看到过令牌,并且鉴于已经知道,因此还检查了我们是否曾经将其视为语言的一部分。 If we've seen this combination before, increment its count. 如果我们以前见过这种组合,请增加其数量。 If we've seen the token, but not this POS, default it to a count of 1. If we've never even seen the token before, add an entry for it with a sub-dict containing this POS at a count of 1. 如果我们已经看到了令牌,但没有看到此POS,则将其默认设置为1。如果之前从未见过令牌,则为其添加一个条目,并添加一个包含该POS的子字典的子字典。 。

You could get the same effect with a default dict, but I thought seeing how it worked behind the scenes would be clearer. 您可以使用默认dict获得相同的效果,但是我想看看它在幕后如何工作会更清晰。

EDIT: To print the resulting dict, try 编辑:要打印结果字典,请尝试

for token in dicts.keys():
    for pos in dicts[token].keys():
        print "%s %s: %s" % (token, pos, dicts[token][pos])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM