出现次数多的单词的NLTK索引

Question

I'm trying to use python to find the index of the word 'the' in the following text 我正在尝试使用python在以下文本中查找单词'the'的索引

sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']

If I do sent3.index('the') I get 1 , which is the index of the first occurrence of the word. 如果我确实sent3.index('the') ，则得到1 ，这是该单词首次出现的索引。 What I'm not sure on is how to find the indexes of the other times "the" appears. 我不确定是如何找到其他出现“ the”的索引。 Does anyone know how I could go about doing this? 有人知道我该怎么做吗？

Thanks! 谢谢！

Answer 1

[i for i, item in enumerate(sent3) if item == wanted_item]

Demo: 演示：

>>> sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']
>>> [i for i, item in enumerate(sent3) if item == 'the']
[1, 5, 8]

enumerate just constructs a list of tuples from an iterable, consisting of their values and their corresponding indices. enumerate只是从一个可迭代对象构造一个元组list ，包括它们的值和相应的索引。 We can use this to check if the value is what we want, and if so pull the index from it. 我们可以使用它来检查该值是否是我们想要的值，如果是，则从中拉出索引。

Answer 2

>>> from collections import defaultdict
>>> sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']
>>> idx = defaultdict(list)
>>> for i,j in enumerate(sent3):
...     idx[j].append(i)
... 
>>> idx['the']
[1, 5, 8]

出现次数多的单词的NLTK索引

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-04-13 15:50:15

解决方案2
0 2014-04-14 10:18:55

出现次数多的单词的NLTK索引

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-04-13 15:50:15

解决方案2 0 2014-04-14 10:18:55

解决方案1
1 已采纳 2014-04-13 15:50:15

解决方案2
0 2014-04-14 10:18:55