I am trying to find a way to split string based on given words only.
Also the new list should respect the word order from the initial string (text)
Few examples below:
def split_str_from_words(words, text):
return ???
split_str_from_words(["hello", "world"], "helloworldhello")
split_str_from_words(["hello"], "helloworldhowareyouhello")
split_str_from_words(["hello", "how", "are", "you", "world"], "helloworldhowareyouhello")
Based on the 3 examples above, the function should return:
["hello", "world", "hello"]
["hello", "worldhowareyou", "hello"]
["hello", "world", "how", "are", "you", "hello"]
I have no clue how to do it ( I tried with functions such as split but so far nothing works as expected.
I have idea how to create my own algorithm but I wonder if there was any built-in functions that I can use for this case.
Thank you in advance.
EDIT:
So far I am able to detect all my words occurrence / position / word length
It could be really useful to keep the order of the words and slice strings.
import re
def split_str_from_words(words, text):
for word in words:
positions = [m.start() for m in re.finditer(word, text)]
print(word, positions, len(positions), len(word))
return ""
For the proposed example, re.split
joining all words to be matched with |
should do.
def split_str_from_words(l, s):
m = re.split(rf"({'|'.join(l)})", s)
return [i for i in m if i] # removes empty strings (improvements are welcome)
import re
split_str_from_words(["hello", "world"], "helloworldhello")
# ['hello', 'world', 'hello']
split_str_from_words(["hello"], "helloworldhowareyouhello")
# ['hello', 'worldhowareyou', 'hello']
split_str_from_words(["hello", "how", "are", "you", "world"], "helloworldhowareyouhello")
# ['hello', 'world', 'how', 'are', 'you', 'hello']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.