繁体   English   中英

找到第一个子字符串拆分字符串

[英]Split string by first substring found

我希望在第一次出现这些词时,用某些词语来分句。 让我说明一下:

message = 'I wish to check my python code for errors to run the program properly with fluency'

我希望在第一次出现for/to/with拆分上面的消息,因此上面消息的结果将check my python code for errors to run the program properly with fluency

我还希望包含我将句子拆分的单词,因此我的最终结果将是: to check my python code for errors to run the program properly with fluency

我的代码不起作用:

import re
message = 'I wish to check my python code for errors to run the program properly with fluency'
result = message.split(r"for|to|with",1)[1]
print(result)

我能做什么?

message = 'I wish to check my python code for errors to run the program properly with fluency'
array = message.split(' ')
number = 0
message_new = ''
for i in range(len(array)):
    if array[i] == 'to' or array[i] == 'for':
        number=i
        break
for j in range(number,len(array)):
    message_new += array[j] + ' '
print(message_new) 

输出:

to check my python code for errors to run the program properly with fluency 

split不会将正则表达式作为参数(也许你正在考虑Perl)。

以下是您想要的:

import re
message = 'I wish to check my python code for errors to run the program properly with fluency'
result = re.search(r'\b(for|to|with)\b', message)
print message[result.start(1):]

这不使用替换,重新加入或循环,而只是简单搜索所需的字符串并使用其位置结果。

这个问题已经回答: 如何删除python中特定字符之前的所有字符,但它只适用于一个特定的分隔符,对于多个分隔符,你首先要找出哪个首先出现,可以在这里找到: 怎么能我发现python字符串中第一次出现一个子字符串,你从第一个猜测开始,我没有太多的想象力所以让我们称之为bestDelimiter = firstDelimiter,找出它第一次出现的位置,将位置保存到bestPosition =第一次出现的位置,继续找出其余分隔符的位置,每次你找到一个在当前bestPosition之前出现的分隔符你更新两个变量bestDelimiter和bestPosition,最后出现的那个是最好的分辨符,然后使用bestDelimiter继续应用您需要的操作

我的猜测是,这个简单的表达可能就是这么做的

.*?(\b(?:to|for|with)\b.*)

re.match可能是这五种方法中最快的一种:

re.findall测试

import re

regex = r".*?(\b(?:to|for|with)\b.*)"
test_str = "I wish to check my python code for errors to run the program properly with fluency"
print(re.findall(regex, test_str))

re.sub测试

import re

regex = r".*?(\b(?:to|for|with)\b.*)"
test_str = "I wish to check my python code for errors to run the program properly with fluency"
subst = "\\1"

result = re.sub(regex, subst, test_str)

if result:
    print (result)

re.finditer测试

import re

regex = r".*?(\b(?:to|for|with)\b.*)"

test_str = "I wish to check my python code for errors to run the program properly with fluency"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    # FULL MATCH
    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

re.match测试

import re

regex = r".*?(\b(?:to|for|with)\b.*)"
test_str = "I wish to check my python code for errors to run the program properly with fluency"

print(re.match(regex, test_str).group(1))

re.search测试

import re

regex = r".*?(\b(?:to|for|with)\b.*)"
test_str = "I wish to check my python code for errors to run the program properly with fluency"

print(re.search(regex, test_str).group(1))

如果您希望进一步探索或修改它,可以在本演示的右上方面板中解释该表达式,如果您愿意,可以在此链接中查看它与某些示例输入的匹配情况。

您可以先找到fortowith所有实例,拆分所需的值,然后拼接并重新加入:

import re
message = 'I wish to check my python code for errors to run the program properly with fluency'
vals, [_, *s] = re.findall(r"\bfor\b|\bto\b|\bwith\b", message), re.split(r"\bfor\b|\bto\b|\bwith\b", message)
result = ''.join('{} {}'.format(a, re.sub("^\s+", "", b)) for a, b in zip(vals, s))

输出:

'to check my python code for errors to run the program properly with fluency'

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM