I have a keyword "grand master" and I am searching for the keyword in the huge text. I need to extract 5 before words and 5 after words of the keyword (based on the position it might go to the next/before sentence also), and this keyword appears multiple times in huge text.
As a trail , first i tried to find the position of the keyword in the text, using text.find()
, and found the keywords at 4 different positions
>>positions
>>[125, 567,34445, 98885445]
So tried to split the text based on spaces and taking first 5 words,
text[positions[i]:].split([len(keyword.split()):len(keyword.split())+5]
But how to extract the 5 words before that keyword?
你可以简单地使用
text[:position[i]].split()[-5:]
Use re module for this. For the first keyword match:
pattern = "(.+) (.+) (.+) (.+) (.+) grand master (.+) (.+) (.+) (.+) (.+)"
match = re.search(pattern, text)
if match:
firstword_before = match.group(1) # first pair of parentheses
lastword_before = match.group(5)
firstword_after = match.group(6)
lastword_after = match.group(10)
Parentheses in the pattern indicates the group number. First pair of parentheses corresponds to match.group(1), second pair of parentheses corresponds to match.group(2) and so on. If you want all the groups you can use:
match.groups() # returns tuple of groups
or
match.group(0) # returns string of groups
For all the keyword match in the text, use re.findall. Read re for details.
PS: There are better ways to write patterns. Thats just me being lazy.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.