简体   繁体   English

使用正则表达式拆分会产生不需要的空字符串

[英]Splitting using regex is giving unwanted empty strings

In python, I'm am executing this: 在python中,我正在执行以下命令:

>>> re.split("(hello|world|-)", 'hello-world')


I am expecting this: 我期望这样:
['hello', '-', 'world']

however, I am getting this: 但是,我得到这个:
['', 'hello', '', '-', '', 'world', '']

where is this '' coming from? 这哪里是''是从哪里来的?

I am using python 3 in case it matters 我正在使用python 3以防万一


Edit 编辑

Many of you are saying I could split it on - however, I want to extract tokens if that makes sense. 你们中的许多人都说我可以将其拆分-但是,如果可以的话,我想提取tokens Example if I had "hellohello---worldhello" . 如果我有"hellohello---worldhello"示例。 I want it to return 我要它回来

['hello', 'hello', '-', '-', '-', 'world', 'hello']

According to the documentation: 根据文档:

If there are capturing groups in the separator and it matches at the start of the string, the result will start with an empty string. 如果分隔符中有捕获组,并且该匹配组在字符串的开头匹配,则结果将从空字符串开始。 The same holds for the end of the string: 字符串的末尾也是如此:

You could always use filter to control your list if this is your concern. 如果您对此感到担忧,可以始终使用filter来控制列表。

>>> filter(None, re.split('(hello|world|-)', 'hellohello---worldhello'))
['hello', 'hello', '-', '-', '-', 'world', 'hello']

Or use findall to grab your matches. 或使用findall来获取您的比赛。

>>> re.findall('(hello|world|-)', 'hellohello---worldhello')
['hello', 'hello', '-', '-', '-', 'world', 'hello']

The extra output elements are because you are asking re to split the string on eg hello, so it tries to tell you what is before hello, what is between hello and '-', etc. All are empty strings. 多余的输出元素是因为您要求re在例如hello上分割字符串,因此它试图告诉您hello之前是什么,hello和'-'之间是什么,等等。所有都是空字符串。

If you change it to : 如果将其更改为:

re.split("(-)", 'hello-world')

You will get the desired result 您会得到理想的结果

['hello', '-', 'world']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM