[英]dictionary key values show only unique results instead all
i have corpus_test
then i'm upgrade him to list with split by words.我有
corpus_test
然后我将他升级为按单词拆分的列表。 i need have 2 dictionarys
from this and len
of text words.我需要 2
dictionarys
和len
文本单词。 problem is unique values.问题是唯一值。 i need all of them, even duplicates.
我需要所有这些,甚至是重复的。
corpus_test = 'cat dog tiger tiger tiger cat dog lion'
corpus_test = [[word.lower() for word in corpus_test.split()]]
word_counts = defaultdict(int)
for rowt in corpus_test:
for wordt in rowt:
word_counts[wordt] += 1
index_wordso = dict((i, word) for i, word in enumerate(rowt))
word_indexso = dict((word, i) for i, word in enumerate(rowt))
v_countso = len(index_wordso)
my code give me right outputs with index_wordso
and v_countso
:我的代码用
index_wordso
和v_countso
给了我正确的输出:
index_wordso
#{0: 'cat',
1: 'dog',
2: 'tiger',
3: 'tiger',
4: 'tiger',
5: 'cat',
6: 'dog',
7: 'lion'}
v_countso
#8
but word_indexso
(inverse dict
to index_wordso
) give's me not right output:但是
word_indexso
(与index_wordso
反向dict
)给了我不正确的输出:
word_indexso
#{'cat': 5, 'dog': 6, 'tiger': 4, 'lion': 7}
that's give me only last values, not all.那只是给我最后的值,而不是全部。 i need all 8 values
我需要所有 8 个值
Keys in a dictionary are unique, values are not.字典中的键是唯一的,值不是。 It's like a word dictionary: there can be multiple definitions of a word, but not multiple word listings.
这就像一个单词词典:一个词可以有多个定义,但不能有多个词列表。
A workaround is using a list of tuples:解决方法是使用元组列表:
corpus_test = 'cat dog tiger tiger tiger cat dog lion'
corpus_test = [word.lower() for word in corpus_test.split()]
print([(a,b) for (a, b) in zip(corpus_test, range(len(corpus_test)))])
which results in这导致
[('cat', 0),
('dog', 1),
('tiger', 2),
('tiger', 3),
('tiger', 4),
('cat', 5),
('dog', 6),
('lion', 7)]
Keep in mind, though, that this is not a lookup table, and so you must loop through the elements (in some way) to find a speficic element.但是请记住,这不是查找表,因此您必须(以某种方式)遍历元素以查找特定元素。
Another method is to use a dictionary of lists:另一种方法是使用列表字典:
from collections import defaultdict
word_indexso = defaultdict(list)
corpus_test = 'cat dog tiger tiger tiger cat dog lion'.split()
for index, word in enumerate(corpus_test):
word_indexso[word].append(index)
print(word_indexso)
which results in这导致
defaultdict(<class 'list'>, {'cat': [0, 5], 'dog': [1, 6], 'tiger': [2, 3, 4], 'lion': [7]})
which can be looked up with eg word_indexso["cat"]
to get the list of numbers associated with the word.可以使用例如
word_indexso["cat"]
来word_indexso["cat"]
与该单词相关联的数字列表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.