[英]How to find words or sentences around the found word?
How to find words next to the found word?如何在找到的单词旁边找到单词? I want to see the left word AND the right word of the found word x too.我也想查看找到的单词 x 的左侧单词和右侧单词。
I was able to extract the index of the found word in the sourcetext.我能够提取源文本中找到的单词的索引。 But by doing sourcetext[sourceindex+1] it gives me just the letter of that word.但是通过 sourcetext[sourceindex+1] 它只给了我那个词的字母。 It should give me the next word next to the found word.它应该给我找到的单词旁边的下一个单词。 What am I doing wrong?我究竟做错了什么?
sourcetext=browser.page_source
searchword= ["hello","world","pretty","life"]
for x in searchword:
if x in sourcetext:
sourceindex=sourcetext.index(x)
print("FOUND!" + x + " " + sourcetext[sourceindex+1])
else:
continue
sourcetext=browser.page_source
searchword= ["hello","world","pretty","life"]
for x in searchword:
if x in sourcetext:
sourceindex=sourcetext.index(x)
next_word=""
i=1
while True:
try:
if sourcetext[sourceindex+len(x)+i] !=" ":
next_word+=sourcetext[sourceindex+len(x)+i]
else:
break
i+=1
except IndexError:
break
print("FOUND!" + x + " " + next_word)
This is a simple solution and can be expanded to work with Selenium or Bs4.这是一个简单的解决方案,可以扩展为与 Selenium 或 Bs4 一起使用。
sentence = "this is a six word sentence."
search = "six"
sentence = sentence.split(" ")
if search in sentence:
my_index = sentence.index(search)
word_before = my_index - 1
word_after = my_index + 1
print(sentence[word_before], search, sentence[word_after])
It works by splitting the original text into a list.它通过将原始文本拆分为列表来工作。 The if statement takes a word or variable and checks if it is in the list, if it is it finds the index value of that word which is recorded in my_index. if 语句接受一个单词或变量并检查它是否在列表中,如果是,它会找到记录在 my_index 中的那个单词的索引值。 This can then be used to find the word before and after that word.然后可以使用它来查找该单词之前和之后的单词。
This can be a slow solution when larger texts are used.当使用较大的文本时,这可能是一个缓慢的解决方案。
If I understand correctly, you want to have the word to the left and right of the searchword.如果我理解正确,您希望该词位于搜索词的左侧和右侧。 My approach here would be to use regex to find the words.我在这里的方法是使用正则表达式来查找单词。
import re
for searchword in searchwords:
match = re.search(r'(?:(\w*)\s)?{}(?:\s(\w*))?'.format(searchword), sourcetext)
if match:
print('{} is between {} and {}'.format(searchword, match.group(1), match.group(2)))
This solution should work quite well for long texts.这个解决方案应该适用于长文本。 For example, if there is no word on the left side, group(1) = None
and you can query it easily.例如,如果左侧没有单词, group(1) = None
,您可以轻松查询。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.