[英]How can i split this string using regular expressions
I have a string similar to:我有一个类似的字符串:
"'a b | c'\,\,\, 'd | e f' ,,, 'g | h"
I want to use re.split to get the following list:我想使用 re.split 来获取以下列表:
["a b|c", "d|e f", "g|h"]
I have tried the following but do not get the output i want, essentially i need to get rid all everything aside from the letters and the pipe operator, and split.我尝试了以下但没有得到我想要的 output ,基本上我需要摆脱除了字母和 pipe 运算符之外的所有东西,然后拆分。 One issue is that sometimes both ' and " are used:
一个问题是有时会同时使用 ' 和 ":
re.compile(r'[\"\',][\W+]', re.UNICODE).split(txt.lower())
Remove the spaces around |
去掉
|
周围的空格as a separate step after splitting.作为拆分后的单独步骤。
split = re.compile(r'[\"\',][\W+]', re.UNICODE).split(txt.lower())
cleaned = [re.sub(r'\s*\|\s*', '|', x) for x in split]
I don't think you can just use split
.我不认为你可以只使用
split
。 You probably can't get rid of the first quote, or will end up with an empty first item.:您可能无法摆脱第一个引号,或者最终会得到一个空的第一个项目。:
Here is one attempt, but it fails to remove the initial '
:这是一次尝试,但未能删除初始
'
:
re.split(r"(?<=.)'[^']+'", txt)
output: ["'ab | c", 'd | e f', 'g | h']
output:
["'ab | c", 'd | e f', 'g | h']
["'ab | c", 'd | e f', 'g | h']
An alternative with findall
: findall
的替代方案:
re.findall(r"'([^']+)'?", txt)
output: ['ab | c', 'd | e f', 'g | h']
output:
['ab | c', 'd | e f', 'g | h']
['ab | c', 'd | e f', 'g | h']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.