[英]How to find all matches using python regular expression finditer
I am trying to find a pattern. 我正在尝试找到一种模式。 I have written the below code:
我写了下面的代码:
string = '000,001,100,001'
pattern = '(.*)00(.*),(.*)00(.*)'
for m in re.finditer(pattern, string):
print(m.groups())
The code above returns ('000,001,1', '', '', '1') where as it misses the match with groups ('', '0', '', '1,100,001') 上面的代码返回('000,001,1','',``,'1'),因为它错过了与组('','0','','1,100,001')的匹配
I am trying to workout if the characters before and after the '00' on consecutive lines are the same. 我正在尝试锻炼连续行上“ 00”之前和之后的字符是否相同。 The code i wrote matches '000,001,1 00 , 00 1'.
我的代码写的匹配'000,001,1 00,00 1'。 How to match ' 00 0, 00 1,100,001'.
如何匹配'00 0,00 1100001'。
How the obtain match groups for the later? 以后如何获取匹配组?
Comments : for the string '2295051,2238451,2235301,1950522,2238451,3530333'
注释 :字符串'2295051,2238451,2235301,1950522,2238451,3530333'
... You see that the groups have the same number of digits before the occur which is 2 digits and after they occur which is 1 digit...您会看到组在发生之前为2位数字,而在发生之后为1位数字
string = '2295051,2238451,2235301,1950522,2238451,3530333'
_Step 1_
pattern = '(\d+)'
Output: ('2295051',) ('2238451',) ('2235301',) ('1950522',) ('2238451',) ('3530333',)
_Step 2_
pattern = '((\d\d)\d+)'
Output: ('2295051', '22') ('2238451', '22') ('2235301', '22')
('1950522', '19') ('2238451', '22') ('3530333', '35')
_Step 3_
pattern = '((\d\d)\d+(\d))'
Output: ('2295051', '22', '1') ('2238451', '22', '1') ('2235301', '22', '1')
('1950522', '19', '2') ('2238451', '22', '1') ('3530333', '35', '3')
Read about the meaning of '+'
in the Docs re.html#module-re . 在文档re.html#module-re中了解
'+'
的含义。
Comment : ...what i don't understand is how it does it and how i can make use of it...
评论 :...我不明白的是它是如何做到的,我该如何利用它...
The pattern = '((\\d\\d)\\d+(\\d))'
search for a substring, starting with 2 Digits \\d\\d
, followed by any number of Digits, at least one, and one Digit \\d
at the End. pattern = '((\\d\\d)\\d+(\\d))'
搜索一个子字符串,从2位\\d\\d
,然后是任意数量的数字,至少一位,然后一位\\d
结束。 This pattern is generalized, matches any substring, at least with len=4, of Digits. 此模式是通用的,与任何位数至少与len = 4的子字符串匹配。
Try this pattern: 试试这个模式:
string = '000,001,100,001'
pattern = '((\d)00|00(\d))'
for m in re.finditer(pattern, string):
print(m.groups())
Output : 输出 :
('000', '0', None)
(“ 000”,“ 0”,无)
('001', None, '1')('001',无,'1')
('100', '1', None)(“ 100”,“ 1”,无)
('001', None, '1')('001',无,'1')
The first item 000
have both, before and after . 第一项
000
前后兼得。
Tested with Python:3.4.2 - re:2.2.1 使用Python:3.4.2-re:2.2.1测试
Come back and Flag your Question as answered if this is working for you or comment why not. 请回来,如果这对您有用,则将您的问题标记为已回答,或者评论为什么不这样做。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.