如何使用python正则表达式finditer查找所有匹配项

Question

I am trying to find a pattern. 我正在尝试找到一种模式。 I have written the below code: 我写了下面的代码：

string = '000,001,100,001'
pattern = '(.*)00(.*),(.*)00(.*)'

for m in re.finditer(pattern, string):
    print(m.groups())

The code above returns ('000,001,1', '', '', '1') where as it misses the match with groups ('', '0', '', '1,100,001') 上面的代码返回（'000,001,1'，''，``，'1'），因为它错过了与组（''，'0'，''，'1,100,001'）的匹配

I am trying to workout if the characters before and after the '00' on consecutive lines are the same. 我正在尝试锻炼连续行上“ 00”之前和之后的字符是否相同。 The code i wrote matches '000,001,1 00 , 00 1'. 我的代码写的匹配'000,001,1 00，00 1'。 How to match ' 00 0, 00 1,100,001'. 如何匹配'00 0，00 1100001'。

How the obtain match groups for the later? 以后如何获取匹配组？

Answer 1

Comments : for the string '2295051,2238451,2235301,1950522,2238451,3530333' 注释：字符串'2295051,2238451,2235301,1950522,2238451,3530333'
... You see that the groups have the same number of digits before the occur which is 2 digits and after they occur which is 1 digit ...您会看到组在发生之前为2位数字，而在发生之后为1位数字

string = '2295051,2238451,2235301,1950522,2238451,3530333'  

_Step 1_  
pattern = '(\d+)'
Output: ('2295051',) ('2238451',) ('2235301',) ('1950522',) ('2238451',) ('3530333',)  

_Step 2_
pattern = '((\d\d)\d+)'  
Output: ('2295051', '22') ('2238451', '22') ('2235301', '22')  
        ('1950522', '19') ('2238451', '22') ('3530333', '35')  

_Step 3_
pattern = '((\d\d)\d+(\d))'
Output: ('2295051', '22', '1') ('2238451', '22', '1') ('2235301', '22', '1')  
('1950522', '19', '2') ('2238451', '22', '1') ('3530333', '35', '3')

Read about the meaning of '+' in the Docs re.html#module-re . 在文档re.html＃module-re中了解'+'的含义。

Comment : ...what i don't understand is how it does it and how i can make use of it... 评论：...我不明白的是它是如何做到的，我该如何利用它...

The pattern = '((\\d\\d)\\d+(\\d))' search for a substring, starting with 2 Digits \\d\\d , followed by any number of Digits, at least one, and one Digit \\d at the End. pattern = '((\\d\\d)\\d+(\\d))'搜索一个子字符串，从2位\\d\\d ，然后是任意数量的数字，至少一位，然后一位\\d结束。 This pattern is generalized, matches any substring, at least with len=4, of Digits. 此模式是通用的，与任何位数至少与len = 4的子字符串匹配。

Try this pattern: 试试这个模式：

string = '000,001,100,001'
pattern = '((\d)00|00(\d))'

for m in re.finditer(pattern, string):
    print(m.groups())

Output : 输出：

('000', '0', None) （“ 000”，“ 0”，无）
('001', None, '1') （'001'，无，'1'）
('100', '1', None) （“ 100”，“ 1”，无）
('001', None, '1') （'001'，无，'1'）

The first item 000 have both, before and after . 第一项000 前后兼得。

Tested with Python:3.4.2 - re:2.2.1 使用Python：3.4.2-re：2.2.1测试
Come back and Flag your Question as answered if this is working for you or comment why not. 请回来，如果这对您有用，则将您的问题标记为已回答，或者评论为什么不这样做。

如何使用python正则表达式finditer查找所有匹配项

问题描述

1 个解决方案

解决方案1
0 2017-03-17 17:14:51

如何使用python正则表达式finditer查找所有匹配项

问题描述

1 个解决方案

解决方案1 0 2017-03-17 17:14:51

解决方案1
0 2017-03-17 17:14:51