简体   繁体   English

正则表达式匹配项之间的字符串列表

[英]List of strings between regex matches

How do I find all the strings between a regex pattern? 如何找到正则表达式模式之间的所有字符串? For example, 例如,

>>> s="123 asd 12 456 sfd g 789"
>>> reg=re.compile("\d{3}")
>>> reg.findall(s)
['123', '456', '789']

I want to find: 我想找到:

[' asd 12 ', ' sfd g ']

Use the .split() method instead of .findall() : 使用.split()方法代替.findall()

>>> reg.split(s)
['', ' asd 12 ', ' sfd g ', '']

It includes all results in between the matches, including the empty strings at the start and end. 它包括匹配之间的所有结果,包括开头和结尾的空字符串。 You can filter those out: 您可以过滤掉它们:

>>> filter(None, reg.split(s))
[' asd 12 ', ' sfd g ']

although on Python 3 you'd need to use list(filter(None, reg.split(s))) , or iterate over the result of filter() . 尽管在Python 3上,您需要使用list(filter(None, reg.split(s))) ,或遍历filter()的结果。

使用re.split而不是re.findall

You could try something like: 您可以尝试类似:

>>> reg = re.compile(r'(?:\d{3})?(.*?)\d{3}')
>>> reg.findall("123 asd 12 456 sfd g 789")
[' asd 12 ', ' sfd g ']

Since .findall() won't find overlapping matches, you need to specify the first group of numbers as being an optional match. 由于.findall()不会找到重叠的匹配项,因此您需要将第一组数字指定为可选匹配项。 In the end, it might be better to take a different approach than regexes alone for a more robust solution. 最后,最好采用其他方法而不是仅使用正则表达式来获得更可靠的解决方案。

>>> s = "123 asd 12 456 sfd g 789"
>>> filter(None, re.compile("\d{3}").split(s))
[' asd 12 ', ' sfd g ']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM