How to get surrounding words of substring in string, if the substring repeats itself?

Question

I have a task where I need to fetch N words before and after every substring (could be multiple words) in a string. I initially considered using str.split(" ") and work with the list but the issue is I'm fetching a substring which can be multiple words.

I've tried using str.partition and its very close to doing exactly what I want but it only gets the first keyword.

Code:

text = "Hello World how are you doing Hello is the keyword I'm trying to get Hello is a repeating word"
part = text.partition("Hello")
part = list(map(str.strip, part))

Output:

['', 'Hello', "World how are you doing Hello is the keyword I'm trying to get Hello is a repeating word"]

This gets me exactly what I need for the first keyword. I have enough to then get the prior and posterior words. Unfortunately, this fails me when the substring I'm looking for is repeating.

If the output could instead be a list of list partitions then I could actually make it work. How should I approach this?

Answer 1

text = "Hello World how are you doing Hello is the keyword I'm trying to get Hello is a repeating word"

def recursive_partition(text, pattern):
  if not text:
    return text
  tmp = text.partition(pattern)
  if tmp and tmp[1]:
    return [tmp[0]] + [tmp[1]] + recursive_partition(tmp[2], pattern)
  else:
    return [tmp[0]]

res = recursive_partition(text, "Hello")
print(res)  # ['', 'Hello', ' World how are you doing ', 'Hello', " is the keyword I'm trying to get ", 'Hello', ' is a repeating word']

How to get surrounding words of substring in string, if the substring repeats itself?

Question

1 answers

solution1
0 ACCPTED 2022-06-30 17:09:36

How to get surrounding words of substring in string, if the substring repeats itself?

Question

1 answers

solution1 0 ACCPTED 2022-06-30 17:09:36

solution1
0 ACCPTED 2022-06-30 17:09:36