[英]Using Python, how do I split on multiple delimiters and keep only one in my output list?
a very green python user here, so go easy on me and the docs haven't helped me understand what I'm missing.这里是一个非常绿色的python用户,所以对我放轻松,文档并没有帮助我理解我错过了什么。 Similar to RE split multiple arguments |类似于RE 拆分多个参数 | (or) returns none python , I need to split a string on multiple delimiters. (或)返回 none python ,我需要在多个分隔符上拆分字符串。 The above question only allows either keeping none or keeping both delimiters - I need to keep only one of them.上述问题只允许使用无保留或保留两个分隔符-我只需要保留其中之一。 Note that the above question was from 2012, so likely a much earlier version of Python that 3.6, which I'm using.请注意,上述问题来自 2012 年,因此很可能是我正在使用的 Python 3.6 的更早版本。
My data:我的数据:
line = 'APPLE,ORANGE CHERRY APPLE'
I want a list returned that looks like:我想要一个返回的列表,如下所示:
['APPLE', ',', 'ORANGE', 'CHERRY', 'APPLE']
I need to keep the comma so I can remove duplicate components later.我需要保留逗号,以便稍后删除重复的组件。 I have that part working if I could just get the list created properly.如果我能正确创建列表,那部分就可以工作了。 Here's what I've got.这就是我所拥有的。
list = re.split(r'\s|(,)',line)
print(list)
My logic here is split on space and comma but only keep the comma - makes sense to me.我这里的逻辑在空格和逗号上分开,但只保留逗号 - 对我来说很有意义。 Nope:不:
['APPLE', ',', 'ORANGE', None, 'CHERRY', None, 'APPLE']
I've also tried what is mentioned in the above linked question, to put the entire group in a capture:我还尝试了上面链接问题中提到的内容,将整个组放入捕获中:
re.split(r'(\s|(,))',line)
Nope again:没有了:
['APPLE', ',', ',', 'ORANGE', ' ', None, 'CHERRY', ' ', None, 'APPLE']
What am I missing?我错过了什么? I know it's related to how my capture groups are set up but I can't figure it out.我知道这与我的捕获组的设置方式有关,但我无法弄清楚。 Thanks in advance!提前致谢!
I suggest using a matching approach with我建议使用匹配的方法
re.findall(r'[^,\s]+|,', line)
See the regex demo .请参阅正则表达式演示。 The [^,\\s]+|,
pattern matches [^,\\s]+|,
模式匹配
[^,\\s]+
- one or more chars other than a comma and whitespace [^,\\s]+
- 除逗号和空格之外的一个或多个字符|
- or - 或者,
- a comma. ,
- 逗号。See a Python demo :看一个Python 演示:
import re
line = 'APPLE,ORANGE CHERRY APPLE'
l = re.findall(r'[^,\s]+|,', line)
print(l) # => ['APPLE', ',', 'ORANGE', 'CHERRY', 'APPLE']
Without using regex
you can do like this不使用regex
你可以这样做
res = [x for x in line.replace(',', ' , ').split()]
print(res)
Output:输出:
['APPLE', ',', 'ORANGE', 'CHERRY', 'APPLE']
Filter out None
s:过滤掉None
s:
import re
line = 'APPLE,ORANGE CHERRY APPLE'
print([m for m in re.split('\s+|(,)', line) if m])
>>> ['APPLE', ',', 'ORANGE', 'CHERRY', 'APPLE']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.