简体   繁体   中英

NLTK index of a word with mulitiple occurences

I'm trying to use python to find the index of the word 'the' in the following text

sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']

If I do sent3.index('the') I get 1 , which is the index of the first occurrence of the word. What I'm not sure on is how to find the indexes of the other times "the" appears. Does anyone know how I could go about doing this?

Thanks!

[i for i, item in enumerate(sent3) if item == wanted_item]

Demo:

>>> sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']
>>> [i for i, item in enumerate(sent3) if item == 'the']
[1, 5, 8]

enumerate just constructs a list of tuples from an iterable, consisting of their values and their corresponding indices. We can use this to check if the value is what we want, and if so pull the index from it.

>>> from collections import defaultdict
>>> sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']
>>> idx = defaultdict(list)
>>> for i,j in enumerate(sent3):
...     idx[j].append(i)
... 
>>> idx['the']
[1, 5, 8]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM