简体   繁体   English

在某些整数字符模式后拆分字符串

[英]Split string after certain integer character pattern

I have a string stored in variable mystring .我有一个字符串存储在变量mystring I wanted to split the string after a character 4-digit-integer character pattern ie (4-digit-integer) .我想在character 4-digit-integer character模式 ie (4-digit-integer)之后拆分字符串。 I suppose this can be done using Python regex.我想这可以使用 Python regex 来完成。

mystring = 'Lorem Ipsum (2018) Amet (Lorem Dolor Amet Elit)'

Desired Output :期望输出:

splitstring = ['Lorem Ipsum (2018)', 'Amet (Lorem Dolor Amet Elit)']

If you don't mind doing some filtering you could do:如果你不介意做一些过滤,你可以这样做:

import re

string = 'Lorem Ipsum (2018) Amet (Lorem Dolor Amet Elit)'
result = [m for m in re.split('([^\d(]+\(\d{4}\))\s+', string) if m]
print(result)

Output输出

['Lorem Ipsum (2018)', 'Amet (Lorem Dolor Amet Elit)']

When using split with a capturing group the result will include the group in this case ([^\\d(]+\\(\\d{4}\\)) ie anything that is not a number nor an open parenthesis followed exactly by four numbers surrounded by parenthesis. No the that the following spaces \\s+ are left out.当对捕获组使用split 时,结果将包括在这种情况下的组([^\\d(]+\\(\\d{4}\\))即任何不是数字也不是开括号的东西,后面紧跟四个数字被括号包围。不,后面的空格\\s+被遗漏了。

Here is a simple way how you could do it.这是一个简单的方法,您可以这样做。

Since brackets have another meaning in REs (they are called capturing groups), you need to escape them like : \\( for opening bracket. Then, you can search for (2018) and then split the text accodringly:由于括号在 RE 中具有另一种含义(它们称为捕获组),因此您需要将它们转义为: \\(用于打开括号。然后,您可以搜索(2018)然后按相应方式拆分文本:

import re
s = 'Lorem Ipsum (2018) Amet (Lorem Dolor Amet Elit)'
match = re.search(r'\(\d{4}\)', s)

split_string = [ s[:match.end()], s[match.end():] ]
print(split_string) 
# ['Lorem Ipsum (2018)', ' Amet (Lorem Dolor Amet Elit)']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM