Python：用包含子字符串的单词拆分字符串

Question

I have a string text = "Fix me a meeting in 2 days" . 我有一个字符串text = "Fix me a meeting in 2 days" 。 I have a list of some words meetingStrings . 我有一些单词meetingStrings的列表。 "meet" is there in meetingStrings . 在meetingStrings有"meet" 。 So, I have to split the text by meeting. 所以，我必须通过见面来分割文本。

Desired Output : 期望的输出：

in 2 days 在2天内

meetingStrings = [
    "appointment",
    "meet",
    "interview"
]
text = "Fix me a meeting in 2 days"
for x in meetingStrings:
    if x in text.lower(): 
        txt = text.split(x, 1)[1]
        print(txt)

This gives Output: 这给出了输出：

ing in 2 days. 在2天内。

Answer 1

Using re.split() : 使用re.split() ：

import re

meetingStrings = [
    "appointment",
    "meet",
    "interview"
]

text = "Fix me a meeting in 2 days"

print(re.split('|'.join(r'(?:\b\w*'+re.escape(w)+r'\w*\b)' for w in meetingStrings), text, 1)[-1])

Prints: 打印：

 in 2 days

Answer 2

With a small change to your code: 只需对代码进行少量更改：

meetingStrings = [
    "appointment",
    "meet",
    "interview"
]
text = "Fix me a meeting in 2 days"
for x in meetingStrings:
    if x in text.lower():
        txt = text.split(x, 1)[1]
        print(txt.split(" ", 1)[1]) #<--- Here

Just take your final output, and split at the first occurrence of a space 只需获取最终输出，并在第一次出现空格时分割

Answer 3

This expression might also work with an i flag: 此表达式也可能与i标志一起使用：

(?:meet|interview|appointment)\S*\s+((?:in|after)\s[0-9]+\s+(?:days?|months?|weeks?|years?))

and we can include any desired words that we might want in the non-capturing groups using logical ORs, such as: 我们可以使用逻辑OR包含我们在非捕获组中可能需要的任何所需单词，例如：

(?:in|after|on|from)

(?:days?|months?|weeks?|years?|hours?)

(?:meet|interview|appointment|session|schedule)

Test 测试

import re

regex = r"(?:meet|interview|appointment)\S*\s+((?:in|after)\s[0-9]+\s+(?:days?|months?|weeks?|years?))"
test_str = "Fix me a meeting in 2 days meetings in 2 months meet in 1 week nomeeting in 2 days meet after 2 days"

print(re.findall(regex, test_str, re.IGNORECASE))

Output 产量

['in 2 days', 'in 2 months', 'in 1 week', 'in 2 days', 'after 2 days']

The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it. 如果您希望探索/简化/修改表达式，请在本演示的右上方面板中进行说明。

RegEx Circuit RegEx电路

jex.im visualizes regular expressions: jex.im可视化正则表达式：

Answer 4

This is for using a search. 这是为了使用搜索。
All you need to do is put the text in the middle of a word 您需要做的就是将文本放在单词的中间
then match the word. 然后匹配这个词。

The result is in capture group 1. 结果在捕获组1中。

No whitespace trim 没有空白修剪

\\b\\w*(?:appointment|meet|interview)\\w*\\b(.*)

https://regex101.com/r/lK4zRz/1 https://regex101.com/r/lK4zRz/1

Readable version 可读版本

 \b 
 \w* 
 (?:
      appointment
   |  meet
   |  interview
 )
 \w* 
 \b 
 ( .* )                        # (1)

With whitespace trim 有空白修剪

(?m)\\b\\w*(?:appointment|meet|interview)\\w*\\b[^\\S\\r\\n]*(.*?)[^\\S\\r\\n]*$

https://regex101.com/r/v2qAOQ/1 https://regex101.com/r/v2qAOQ/1

Additionally, if you add a .* to the beginning of either regex, 另外，如果你在正则表达式的开头添加.* ，
it will always get the last keyword. 它总是会得到最后一个关键字。

Answer 5

Try this: 尝试这个：

import re
text = "Fix me a meeting in 2 days"
print(re.split("({})\\w*".format("|".join(meetingStrings)), text)[-1].strip())

Outputs: in 2 days 产出： in 2 days

Answer 6

Without Regex, str.partition -ing: 没有正则表达式， str.partition -ing：

for x in meetingStrings: 
    pre, _, post = text.lower().partition(x) 
    if post: 
        pre = pre.rpartition(' ')[0] if not pre.endswith(' ') else pre.rstrip() 
        post = post.partition(' ')[-1] if not post.startswith(' ') else post.lstrip() 
        print([pre, post])

Example: 例：

In [35]: meetingStrings = [ 
    ...:     "appointment", 
    ...:     "meet", 
    ...:     "interview" 
    ...: ] 
    ...: text = "Fix me a meeting in 2 days" 

    ...: for x in meetingStrings: 
    ...:     pre, _, post = text.lower().partition(x) 
    ...:     if post: 
    ...:         pre = pre.rpartition(' ')[0] if not pre.endswith(' ') else pre.rstrip() 
    ...:         post = post.partition(' ')[-1] if not post.startswith(' ') else post.lstrip() 
    ...:         print([pre, post]) 
    ...:                                                                                                                                                                                                    
['fix me a', 'in 2 days']

Answer 7

Try something like this: 尝试这样的事情：

import re

meetingStrings = [
        "appointment",
        "meet",
        "interview"
]
text = "Fix me a meeting in 2 days"

def split_string(text, strings):
    search = re.compile('|'.join(strings))
    start = None
    input = text.split()
    for e, x in enumerate(input):
        if search.search(x):
            if start < e:
                yield ' '.join(input[start:e])
            start = None
        else:
            if start is None:
                start = e
    else:
        if start is not None:
            yield ' '.join(input[start:])

print(' '.join(split_string(text, meetingStrings)))

This might be longer, than other answers, but seems to do exactly what you wanted - split on strings, which contain as substring one of strings passed in. 这可能比其他答案更长，但似乎完全符合您的要求 - 在字符串上拆分，其中包含作为子字符串传入的字符串之一。

Answer 8

I have an alternative and much simpler approach, first split all the words in your sentence then chop off the sentence from the location that meetingStrings appear: 我有另一种更简单的方法，首先将你的句子中的所有单词拆分，然后从meetingStrings出现的位置meetingStrings句子：

l=text.split()
for i in meetingStrings:
    for idx, j in enumerate(l):
        if i in j:
            l=l[idx+1:] 
print(' '.join(l))

Gives: 得到：

'in 2 days'

Answer 9

you can just use find() and list slice: 你可以使用find（）和list slice：

text = "Fix me a meeting in 2 days"
meetingStrings = [
    "appointment",
    "meet",
    "interview"
]


sep = [i for i in meetingStrings if i in text]

idx = text.find(sep[0])
idx_ = text[idx:].find(' ')
print (text[idx+idx_:])

output: 输出：

in 2 days

Python：用包含子字符串的单词拆分字符串

问题描述

9 个解决方案

解决方案1
6 已采纳 2019-07-16 15:09:59

解决方案2
1 2019-07-16 15:16:05

解决方案3
1 2019-07-16 15:29:41

Test 测试

Output 产量

RegEx Circuit RegEx电路

解决方案4
1 2019-07-16 15:37:07

解决方案5
0 2019-07-16 15:13:55

解决方案6
0 2019-07-16 15:17:36

解决方案7
0 2019-07-16 15:19:24

解决方案8
0 2019-07-16 15:20:18

解决方案9
0 2019-07-16 15:27:48

Python：用包含子字符串的单词拆分字符串

问题描述

9 个解决方案

解决方案1 6 已采纳 2019-07-16 15:09:59

解决方案2 1 2019-07-16 15:16:05

解决方案3 1 2019-07-16 15:29:41

Test 测试

Output 产量

RegEx Circuit RegEx电路

解决方案4 1 2019-07-16 15:37:07

解决方案5 0 2019-07-16 15:13:55

解决方案6 0 2019-07-16 15:17:36

解决方案7 0 2019-07-16 15:19:24

解决方案8 0 2019-07-16 15:20:18

解决方案9 0 2019-07-16 15:27:48

解决方案1
6 已采纳 2019-07-16 15:09:59

解决方案2
1 2019-07-16 15:16:05

解决方案3
1 2019-07-16 15:29:41

解决方案4
1 2019-07-16 15:37:07

解决方案5
0 2019-07-16 15:13:55

解决方案6
0 2019-07-16 15:17:36

解决方案7
0 2019-07-16 15:19:24

解决方案8
0 2019-07-16 15:20:18

解决方案9
0 2019-07-16 15:27:48