简体   繁体   中英

How to get surrounding words of substring in string, if the substring repeats itself?

I have a task where I need to fetch N words before and after every substring (could be multiple words) in a string. I initially considered using str.split(" ") and work with the list but the issue is I'm fetching a substring which can be multiple words.

I've tried using str.partition and its very close to doing exactly what I want but it only gets the first keyword.

Code:

text = "Hello World how are you doing Hello is the keyword I'm trying to get Hello is a repeating word"
part = text.partition("Hello")
part = list(map(str.strip, part))

Output:

['', 'Hello', "World how are you doing Hello is the keyword I'm trying to get Hello is a repeating word"]

This gets me exactly what I need for the first keyword. I have enough to then get the prior and posterior words. Unfortunately, this fails me when the substring I'm looking for is repeating.

If the output could instead be a list of list partitions then I could actually make it work. How should I approach this?

text = "Hello World how are you doing Hello is the keyword I'm trying to get Hello is a repeating word"

def recursive_partition(text, pattern):
  if not text:
    return text
  tmp = text.partition(pattern)
  if tmp and tmp[1]:
    return [tmp[0]] + [tmp[1]] + recursive_partition(tmp[2], pattern)
  else:
    return [tmp[0]]

res = recursive_partition(text, "Hello")
print(res)  # ['', 'Hello', ' World how are you doing ', 'Hello', " is the keyword I'm trying to get ", 'Hello', ' is a repeating word']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM