[英]Find the sentence’s index (sentences in a list) of a specific word in Python
i currently have a file that contains a list that is looks like 我目前有一个文件,其中包含一个看起来像
example = ['Mary had a little lamb' ,
'Jack went up the hill' ,
'Jill followed suit' ,
'i woke up suddenly' ,
'it was a really bad dream...']
I would like to find the index of the sentence with the word “woke” by example. 我想通过示例找到单词“ woke”的句子索引。 In this example the answer should be f(“woke”)=3.
在此示例中,答案应为f(“ woke”)= 3。 F is a function.
F是一个函数。
I tried to tokenize each sentence to first find the index of the word like that: 我试图标记每个句子以首先找到像这样的单词的索引:
>>> from nltk.tokenize import word_tokenize
>>> example = ['Mary had a little lamb' ,
... 'Jack went up the hill' ,
... 'Jill followed suit' ,
... 'i woke up suddenly' ,
... 'it was a really bad dream...']
>>> tokenized_sents = [word_tokenize(i) for i in example]
>>> for i in tokenized_sents:
... print i
...
['Mary', 'had', 'a', 'little', 'lamb']
['Jack', 'went', 'up', 'the', 'hill']
['Jill', 'followed', 'suit']
['i', 'woke', 'up', 'suddenly']
['it', 'was', 'a', 'really', 'bad', 'dream', '...']
But I don't know how to finally get the index of the word and how to link it to the sentence's index. 但是我不知道如何最终获得单词的索引以及如何将其链接到句子的索引。 Does someone know how to do that?
有人知道该怎么做吗?
You can iterate over each string in the list, split on white space, then see if your search word is in that list of words. 您可以遍历列表中的每个字符串,在空白处分割,然后查看搜索单词是否在该单词列表中。 If you do this in a list comprehension, you can return a list of indices to the strings that satisfied this requirement.
如果您在列表理解中执行此操作,则可以将索引列表返回到满足此要求的字符串。
def f(l, s):
return [index for index, value in enumerate(l) if s in value.split()]
>>> f(example, 'woke')
[3]
>>> f(example, 'foobar')
[]
>>> f(example, 'a')
[0, 4]
If you prefer using the nltk
library 如果您更喜欢使用
nltk
库
def f(l, s):
return [index for index, value in enumerate(l) if s in word_tokenize(value)]
for index, sentence in enumerate(tokenized_sents):
if 'woke' in sentence:
return index
For all the sentences: 对于所有句子:
return [index for index, sentence in enumerate(tokenized_sets) if 'woke' in sentence]
If the requirement is to return the first sentence with the occurence of that word you can use something like - 如果要求返回出现该单词的第一句话,则可以使用-
def func(strs, word):
for idx, s in enumerate(strs):
if s.find(word) != -1:
return idx
example = ['Mary had a little lamb' ,
'Jack went up the hill' ,
'Jill followed suit' ,
'i woke up suddenly' ,
'it was a really bad dream...']
func(example,"woke")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.