简体   繁体   English

我如何使用正则表达式拆分此字符串

[英]How can i split this string using regular expressions

I have a string similar to:我有一个类似的字符串:

"'a b | c'\,\,\,  'd | e f' ,,, 'g | h"

I want to use re.split to get the following list:我想使用 re.split 来获取以下列表:

["a b|c", "d|e f", "g|h"]

I have tried the following but do not get the output i want, essentially i need to get rid all everything aside from the letters and the pipe operator, and split.我尝试了以下但没有得到我想要的 output ,基本上我需要摆脱除了字母和 pipe 运算符之外的所有东西,然后拆分。 One issue is that sometimes both ' and " are used:一个问题是有时会同时使用 ' 和 ":

re.compile(r'[\"\',][\W+]', re.UNICODE).split(txt.lower())

Remove the spaces around |去掉|周围的空格as a separate step after splitting.作为拆分后的单独步骤。

split = re.compile(r'[\"\',][\W+]', re.UNICODE).split(txt.lower())
cleaned = [re.sub(r'\s*\|\s*', '|', x) for x in split]

I don't think you can just use split .我不认为你可以只使用split You probably can't get rid of the first quote, or will end up with an empty first item.:您可能无法摆脱第一个引号,或者最终会得到一个空的第一个项目。:

Here is one attempt, but it fails to remove the initial ' :这是一次尝试,但未能删除初始'

re.split(r"(?<=.)'[^']+'", txt)

output: ["'ab | c", 'd | e f', 'g | h'] output: ["'ab | c", 'd | e f', 'g | h'] ["'ab | c", 'd | e f', 'g | h']

An alternative with findall : findall的替代方案:

re.findall(r"'([^']+)'?", txt)

output: ['ab | c', 'd | e f', 'g | h'] output: ['ab | c', 'd | e f', 'g | h'] ['ab | c', 'd | e f', 'g | h']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM