简体   繁体   English

re.findall()不返回所有匹配项?

[英]re.findall() not returning all matches?

I have the following string: 我有以下字符串:

This$#is% Matrix#  %!

I am trying to catch on substrings where special symbols/spaces occur between alphanumeric characters. 我试图抓住在字母数字字符之间出现特殊符号/空格的子字符串。 For eg, my goal is to find these 2 set of substrings: This$#is (special symbols #, $ between 'This' and 'is') and is% Matrix (special symbol % and whitespace between 'is' and 'Matrix'). 例如,我的目标是找到这两组子字符串: This$#is (特殊符号#,$在“ This”和“ is”之间)和is% Matrix (特殊符号%和空白在“ is”和“ Matrix之间” “)。

My regex findall is as follows: 我的正则表达式findall如下:

match = re.findall(r'([\w]{1,})([\s\W]{1,})([\w]{1,})', temp)

It is returning me: [('This', '$#', 'is')] but not the second part ('is% Matrix') . 它正在返回我: [('This', '$#', 'is')]但不是第二部分('is% Matrix') Is there anything I am doing wrong? 我做错了什么吗?

If I change my string to 'is% Matrix' and apply the same regex pattern, I get this: [('is', '% ', 'Matrix')] . 如果我将字符串更改为'is%Matrix'并应用相同的正则表达式模式, [('is', '% ', 'Matrix')]得到以下信息: [('is', '% ', 'Matrix')]

You can use positive lookahead on the part you would like to have overlapping matches: 您可以在想要重叠匹配的部分上使用正向前瞻:

match = re.findall(r'([\w]{1,})([\s\W]{1,})(?=([\w]{1,}))', temp)

match becomes: match变为:

[('This', '$#', 'is'), ('is', '% ', 'Matrix')]

Demo: https://regex101.com/r/2PJmlX/1 演示: https : //regex101.com/r/2PJmlX/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM