[英]Split string after certain integer character pattern
I have a string stored in variable mystring
.我有一个字符串存储在变量
mystring
。 I wanted to split the string after a character 4-digit-integer character
pattern ie (4-digit-integer) .我想在
character 4-digit-integer character
模式 ie (4-digit-integer)之后拆分字符串。 I suppose this can be done using Python regex.我想这可以使用 Python regex 来完成。
mystring = 'Lorem Ipsum (2018) Amet (Lorem Dolor Amet Elit)'
Desired Output :期望输出:
splitstring = ['Lorem Ipsum (2018)', 'Amet (Lorem Dolor Amet Elit)']
If you don't mind doing some filtering you could do:如果你不介意做一些过滤,你可以这样做:
import re
string = 'Lorem Ipsum (2018) Amet (Lorem Dolor Amet Elit)'
result = [m for m in re.split('([^\d(]+\(\d{4}\))\s+', string) if m]
print(result)
Output输出
['Lorem Ipsum (2018)', 'Amet (Lorem Dolor Amet Elit)']
When using split with a capturing group the result will include the group in this case ([^\\d(]+\\(\\d{4}\\))
ie anything that is not a number nor an open parenthesis followed exactly by four numbers surrounded by parenthesis. No the that the following spaces \\s+
are left out.当对捕获组使用split 时,结果将包括在这种情况下的组
([^\\d(]+\\(\\d{4}\\))
即任何不是数字也不是开括号的东西,后面紧跟四个数字被括号包围。不,后面的空格\\s+
被遗漏了。
Here is a simple way how you could do it.这是一个简单的方法,您可以这样做。
Since brackets have another meaning in REs (they are called capturing groups), you need to escape them like : \\(
for opening bracket. Then, you can search for (2018)
and then split the text accodringly:由于括号在 RE 中具有另一种含义(它们称为捕获组),因此您需要将它们转义为:
\\(
用于打开括号。然后,您可以搜索(2018)
然后按相应方式拆分文本:
import re
s = 'Lorem Ipsum (2018) Amet (Lorem Dolor Amet Elit)'
match = re.search(r'\(\d{4}\)', s)
split_string = [ s[:match.end()], s[match.end():] ]
print(split_string)
# ['Lorem Ipsum (2018)', ' Amet (Lorem Dolor Amet Elit)']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.