根据列表中的给定单词拆分字符串

Question

I am trying to find a way to split string based on given words only.我正在尝试找到一种仅根据给定单词拆分字符串的方法。

Also the new list should respect the word order from the initial string (text)新列表也应该尊重初始字符串（文本）的词序

Few examples below:下面的几个例子：

def split_str_from_words(words, text):
    return ???

split_str_from_words(["hello", "world"], "helloworldhello")
split_str_from_words(["hello"], "helloworldhowareyouhello")
split_str_from_words(["hello", "how", "are", "you", "world"], "helloworldhowareyouhello")

Based on the 3 examples above, the function should return:基于上述 3 个示例，function 应返回：

["hello", "world", "hello"]
["hello", "worldhowareyou", "hello"]
["hello", "world", "how", "are", "you", "hello"]

I have no clue how to do it ( I tried with functions such as split but so far nothing works as expected.我不知道该怎么做（我尝试了诸如 split 之类的功能，但到目前为止没有任何效果。

I have idea how to create my own algorithm but I wonder if there was any built-in functions that I can use for this case.我知道如何创建自己的算法，但我想知道是否有任何内置函数可以用于这种情况。

Thank you in advance.先感谢您。

EDIT:编辑：

So far I am able to detect all my words occurrence / position / word length到目前为止，我能够检测到我所有的单词出现 / position / 字长

It could be really useful to keep the order of the words and slice strings.保持单词和切片字符串的顺序可能非常有用。

import re

def split_str_from_words(words, text):
    for word in words:
        positions = [m.start() for m in re.finditer(word, text)]
        print(word, positions, len(positions), len(word))

    return ""

Answer 1

For the proposed example, re.split joining all words to be matched with |对于建议的示例， re.split连接所有要匹配的单词| should do.应该做。

def split_str_from_words(l, s):
    m = re.split(rf"({'|'.join(l)})", s)
    return [i for i in m if i] # removes empty strings (improvements are welcome)

import re

split_str_from_words(["hello", "world"], "helloworldhello")
# ['hello', 'world', 'hello']

split_str_from_words(["hello"], "helloworldhowareyouhello")
# ['hello', 'worldhowareyou', 'hello']

split_str_from_words(["hello", "how", "are", "you", "world"], "helloworldhowareyouhello")
# ['hello', 'world', 'how', 'are', 'you', 'hello']

根据列表中的给定单词拆分字符串

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-04-06 09:35:56

根据列表中的给定单词拆分字符串

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-04-06 09:35:56

解决方案1
1 已采纳 2020-04-06 09:35:56