创建文字中的单词的字典

Question

我想创建一个文本中所有唯一单词的字典。 关键是单词，值是单词的频率

dtt = ['you want home at our peace', 'we went our home', 'our home is nice', 'we want peace at home']
word_listT = str(' '.join(dtt)).split()
wordsT = {v:k for (k, v) in enumerate(word_listT)}
print wordsT

我期望这样的事情：

{'we': 2, 'is': 1, 'peace': 2, 'at': 2, 'want': 2, 'our': 3, 'home': 4, 'you': 1, 'went': 1, 'nice': 1}

但是，我收到此信息：

{'we': 14, 'is': 12, 'peace': 16, 'at': 17, 'want': 15, 'our': 10, 'home': 18, 'you': 0, 'went': 7, 'nice': 13}

显然，我在滥用功能或做错了事。

请帮忙

Answer 1

您正在做的问题是您存储的是单词所在位置的数组索引，而不是这些单词的计数。

为此，您可以使用collections.Counter

from collections import Counter

dtt = ['you want home at our peace', 'we went our home', 'our home is nice', 'we want peace at home']
counted_words = Counter(' '.join(dtt).split())
# if you want to see what the counted words are you can print it
print counted_words

>>> Counter({'home': 4, 'our': 3, 'we': 2, 'peace': 2, 'at': 2, 'want': 2, 'is': 1, 'you': 1, 'went': 1, 'nice': 1})

一些清理：如评论中所述

对于您的' '.join(dtt).split()不需要str() ' '.join(dtt).split()

您也可以删除列表分配，然后在同一行进行计数

Counter(' '.join(dtt).split())

有关列表索引的更多详细信息； 首先，您必须了解您的代码在做什么。

dtt = [
    'you want home at our peace', 
    'we went our home', 
    'our home is nice', 
    'we want peace at home'
]

注意这里有19个字； print len(word_listT)返回19。现在在下一行word_listT = str(' '.join(dtt)).split()您将列出所有单词，看起来像这样

word_listT = [
    'you', 
    'want', 
    'home', 
    'at', 
    'our', 
    'peace', 
    'we', 
    'went', 
    'our', 
    'home', 
    'our', 
    'home', 
    'is', 
    'nice', 
    'we', 
    'want', 
    'peace', 
    'at', 
    'home'
]

再数一次：19个字。 最后一个词是“家”。 列表索引从0开始，因此0到18 = 19个元素。 yourlist[18]是“家”。 这与字符串位置无关，仅与新数组的索引无关。 :)

Answer 2

尝试这个：

from collections import defaultdict

dtt = ['you want home at our peace', 'we went our home', 'our home is nice', 'we want peace at home']
word_list = str(' '.join(dtt)).split()
d = defaultdict(int)
for word in word_list:
    d[word] += 1

Answer 3

enumerate返回单词列表及其索引，而不是其频率。 也就是说，当您创建wordsT字典时，每个v实际上是k的最后一个实例在word_listT中的索引。 要执行所需的操作，使用for循环可能是最简单的方法。

wordsT = {}
for word in word_listT:
    try:
        wordsT[word]+=1
    except KeyError:
        wordsT[word] = 1

创建文字中的单词的字典

问题描述

3 个解决方案

解决方案1
3 已采纳 2015-11-05 19:18:13

解决方案2
1 2015-11-05 19:18:39

解决方案3
0 2015-11-05 19:24:26

创建文字中的单词的字典

问题描述

3 个解决方案

解决方案1 3 已采纳 2015-11-05 19:18:13

解决方案2 1 2015-11-05 19:18:39

解决方案3 0 2015-11-05 19:24:26

解决方案1
3 已采纳 2015-11-05 19:18:13

解决方案2
1 2015-11-05 19:18:39

解决方案3
0 2015-11-05 19:24:26