简体   繁体   English

如何使用python正则表达式finditer查找所有匹配项

[英]How to find all matches using python regular expression finditer

I am trying to find a pattern. 我正在尝试找到一种模式。 I have written the below code: 我写了下面的代码:

string = '000,001,100,001'
pattern = '(.*)00(.*),(.*)00(.*)'

for m in re.finditer(pattern, string):
    print(m.groups())

The code above returns ('000,001,1', '', '', '1') where as it misses the match with groups ('', '0', '', '1,100,001') 上面的代码返回('000,001,1','',``,'1'),因为它错过了与组('','0','','1,100,001')的匹配

I am trying to workout if the characters before and after the '00' on consecutive lines are the same. 我正在尝试锻炼连续行上“ 00”之前和之后的字符是否相同。 The code i wrote matches '000,001,1 00 , 00 1'. 我的代码写的匹配'000,001,1 00,00 1'。 How to match ' 00 0, 00 1,100,001'. 如何匹配'00 0,00 1100001'。

How the obtain match groups for the later? 以后如何获取匹配组?

Comments : for the string '2295051,2238451,2235301,1950522,2238451,3530333' 注释 :字符串'2295051,2238451,2235301,1950522,2238451,3530333'
... You see that the groups have the same number of digits before the occur which is 2 digits and after they occur which is 1 digit ...您会看到组在发生之前为2位数字,而在发生之后为1位数字

string = '2295051,2238451,2235301,1950522,2238451,3530333'  

_Step 1_  
pattern = '(\d+)'
Output: ('2295051',) ('2238451',) ('2235301',) ('1950522',) ('2238451',) ('3530333',)  

_Step 2_
pattern = '((\d\d)\d+)'  
Output: ('2295051', '22') ('2238451', '22') ('2235301', '22')  
        ('1950522', '19') ('2238451', '22') ('3530333', '35')  

_Step 3_
pattern = '((\d\d)\d+(\d))'
Output: ('2295051', '22', '1') ('2238451', '22', '1') ('2235301', '22', '1')  
('1950522', '19', '2') ('2238451', '22', '1') ('3530333', '35', '3')  

Read about the meaning of '+' in the Docs re.html#module-re . 在文档re.html#module-re中了解'+'的含义。

Comment : ...what i don't understand is how it does it and how i can make use of it... 评论 :...我不明白的是它是如何做到的,我该如何利用它...

The pattern = '((\\d\\d)\\d+(\\d))' search for a substring, starting with 2 Digits \\d\\d , followed by any number of Digits, at least one, and one Digit \\d at the End. pattern = '((\\d\\d)\\d+(\\d))'搜索一个子字符串,从2位\\d\\d ,然后是任意数量的数字,至少一位,然后一位\\d结束。 This pattern is generalized, matches any substring, at least with len=4, of Digits. 此模式是通用的,与任何位数至少与len = 4的子字符串匹配。


Try this pattern: 试试这个模式:

string = '000,001,100,001'
pattern = '((\d)00|00(\d))'

for m in re.finditer(pattern, string):
    print(m.groups())

Output : 输出

('000', '0', None) (“ 000”,“ 0”,无)
('001', None, '1') ('001',无,'1')
('100', '1', None) (“ 100”,“ 1”,无)
('001', None, '1') ('001',无,'1')

The first item 000 have both, before and after . 第一项000 前后兼得。

Tested with Python:3.4.2 - re:2.2.1 使用Python:3.4.2-re:2.2.1测试
Come back and Flag your Question as answered if this is working for you or comment why not. 请回来,如果这对您有用,则将您的问题标记为已回答,或者评论为什么不这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM