NLTK index of a word with mulitiple occurences

Question

I'm trying to use python to find the index of the word 'the' in the following text

sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']

If I do sent3.index('the') I get 1 , which is the index of the first occurrence of the word. What I'm not sure on is how to find the indexes of the other times "the" appears. Does anyone know how I could go about doing this?

Thanks!

Answer 1

[i for i, item in enumerate(sent3) if item == wanted_item]

Demo:

>>> sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']
>>> [i for i, item in enumerate(sent3) if item == 'the']
[1, 5, 8]

enumerate just constructs a list of tuples from an iterable, consisting of their values and their corresponding indices. We can use this to check if the value is what we want, and if so pull the index from it.

Answer 2

>>> from collections import defaultdict
>>> sent3 = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth', '.']
>>> idx = defaultdict(list)
>>> for i,j in enumerate(sent3):
...     idx[j].append(i)
... 
>>> idx['the']
[1, 5, 8]

NLTK index of a word with mulitiple occurences

Question

2 answers

solution1
1 ACCPTED 2014-04-13 15:50:15

solution2
0 2014-04-14 10:18:55

NLTK index of a word with mulitiple occurences

Question

2 answers

solution1 1 ACCPTED 2014-04-13 15:50:15

solution2 0 2014-04-14 10:18:55

solution1
1 ACCPTED 2014-04-13 15:50:15

solution2
0 2014-04-14 10:18:55