简体   繁体   中英

how to extract few before words after finding a keyword in text using python

I have a keyword "grand master" and I am searching for the keyword in the huge text. I need to extract 5 before words and 5 after words of the keyword (based on the position it might go to the next/before sentence also), and this keyword appears multiple times in huge text.

As a trail , first i tried to find the position of the keyword in the text, using text.find() , and found the keywords at 4 different positions

>>positions
>>[125, 567,34445, 98885445] 

So tried to split the text based on spaces and taking first 5 words,

text[positions[i]:].split([len(keyword.split()):len(keyword.split())+5]

But how to extract the 5 words before that keyword?

你可以简单地使用

text[:position[i]].split()[-5:]

Use re module for this. For the first keyword match:

pattern = "(.+) (.+) (.+) (.+) (.+) grand master (.+) (.+) (.+) (.+) (.+)"
match = re.search(pattern, text)
if match:
    firstword_before = match.group(1) # first pair of parentheses
    lastword_before = match.group(5)

    firstword_after = match.group(6)
    lastword_after = match.group(10)

Parentheses in the pattern indicates the group number. First pair of parentheses corresponds to match.group(1), second pair of parentheses corresponds to match.group(2) and so on. If you want all the groups you can use:

match.groups() # returns tuple of groups

or

match.group(0) # returns string of groups

For all the keyword match in the text, use re.findall. Read re for details.

PS: There are better ways to write patterns. Thats just me being lazy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM