简体   繁体   English

出现次数多的单词的NLTK索引

[英]NLTK index of a word with mulitiple occurences

I'm trying to use python to find the index of the word 'the' in the following text 我正在尝试使用python在以下文本中查找单词'the'的索引

sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']

If I do sent3.index('the') I get 1 , which is the index of the first occurrence of the word. 如果我确实sent3.index('the') ,则得到1 ,这是该单词首次出现的索引。 What I'm not sure on is how to find the indexes of the other times "the" appears. 我不确定是如何找到其他出现“ the”的索引。 Does anyone know how I could go about doing this? 有人知道我该怎么做吗?

Thanks! 谢谢!

[i for i, item in enumerate(sent3) if item == wanted_item]

Demo: 演示:

>>> sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']
>>> [i for i, item in enumerate(sent3) if item == 'the']
[1, 5, 8]

enumerate just constructs a list of tuples from an iterable, consisting of their values and their corresponding indices. enumerate只是从一个可迭代对象构造一个元组list ,包括它们的值和相应的索引。 We can use this to check if the value is what we want, and if so pull the index from it. 我们可以使用它来检查该值是否是我们想要的值,如果是,则从中拉出索引。

>>> from collections import defaultdict
>>> sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']
>>> idx = defaultdict(list)
>>> for i,j in enumerate(sent3):
...     idx[j].append(i)
... 
>>> idx['the']
[1, 5, 8]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM