简体   繁体   English

如何在找到的单词周围找到单词或句子?

[英]How to find words or sentences around the found word?

How to find words next to the found word?如何在找到的单词旁边找到单词? I want to see the left word AND the right word of the found word x too.我也想查看找到的单词 x 的左侧单词和右侧单词。

I was able to extract the index of the found word in the sourcetext.我能够提取源文本中找到的单词的索引。 But by doing sourcetext[sourceindex+1] it gives me just the letter of that word.但是通过 sourcetext[sourceindex+1] 它只给了我那个词的字母。 It should give me the next word next to the found word.它应该给我找到的单词旁边的下一个单词。 What am I doing wrong?我究竟做错了什么?

    sourcetext=browser.page_source
    searchword= ["hello","world","pretty","life"]
        
        
        for x in searchword:
      
            if x in sourcetext:
    
                sourceindex=sourcetext.index(x)
               
                print("FOUND!" + x + "  " + sourcetext[sourceindex+1])
                                   
            else:
                continue
         
sourcetext=browser.page_source
searchword= ["hello","world","pretty","life"]
        
        
for x in searchword:
      
    if x in sourcetext:
    
        sourceindex=sourcetext.index(x)
        next_word=""
        i=1
        while True:
            try:
                if sourcetext[sourceindex+len(x)+i] !=" ":
                    next_word+=sourcetext[sourceindex+len(x)+i]
                else:
                    break
                i+=1
            except IndexError:
                break
        print("FOUND!" + x + "  " + next_word)

This is a simple solution and can be expanded to work with Selenium or Bs4.这是一个简单的解决方案,可以扩展为与 Selenium 或 Bs4 一起使用。

sentence = "this is a six word sentence."
search = "six"

sentence = sentence.split(" ")

if search in sentence:
    my_index = sentence.index(search)
    word_before = my_index - 1
    word_after = my_index + 1

    print(sentence[word_before], search, sentence[word_after])

It works by splitting the original text into a list.它通过将原始文本拆分为列表来工作。 The if statement takes a word or variable and checks if it is in the list, if it is it finds the index value of that word which is recorded in my_index. if 语句接受一个单词或变量并检查它是否在列表中,如果是,它会找到记录在 my_index 中的那个单词的索引值。 This can then be used to find the word before and after that word.然后可以使用它来查找该单词之前和之后的单词。

This can be a slow solution when larger texts are used.当使用较大的文本时,这可能是一个缓慢的解决方案。

If I understand correctly, you want to have the word to the left and right of the searchword.如果我理解正确,您希望该词位于搜索词的左侧和右侧。 My approach here would be to use regex to find the words.我在这里的方法是使用正则表达式来查找单词。

import re

for searchword in searchwords:
    match = re.search(r'(?:(\w*)\s)?{}(?:\s(\w*))?'.format(searchword), sourcetext)
    if match:
        print('{} is between {} and {}'.format(searchword, match.group(1), match.group(2)))

This solution should work quite well for long texts.这个解决方案应该适用于长文本。 For example, if there is no word on the left side, group(1) = None and you can query it easily.例如,如果左侧没有单词, group(1) = None ,您可以轻松查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何找到单词之间的“连接”来聚类句子 - How to find a 'connection' between words for clustering sentences 如何找到每个句子的最后一个词 - How to find the last words of every sentences 如何在 python 中查找和操作句子中的单词? - How to find and manipulate words in sentences in python? 如何找到哪些句子的词汇最多? - How to find which sentences have the most words in common? 如何从 dataframe 中找到句子中的最大单词和字符数? - How to find the maximum number of words and characters in sentences from a dataframe? 比较句子列表和单词列表,如果单词存在,则返回完整的句子 - Compare List of Sentences and List of words and return complete Sentences , if word is present 如何在列表中的句子列表中找到每个单词的引理和频率计数? - How to find the lemmas and frequency count of each word in list of sentences in a list? python打印带有常用词或频率词的句子? - python print sentences with common word or frequency words? 如何在固定距离内获取单词周围的所有单词 - How to get all the words around a word within a fixed proximity 如果两个句子中的词数不同,如何为基于词的翻译模型生成对齐方式 - How to generate alignments for word-based translation models if number of words are different in both sentences
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM