简体   繁体   中英

python regex search findall capturing groups

I just want to get "66664324", the content between ")" and "-". Why did the search method get the ")" and "-" themselves.

a="(021)66664324-01"
b1=re.findall('\)(.*)-',a)
>['66664324']

b2=re.search('\)(.*)-',a).group()
>')66664324-'

What are differences between the two Code snippets.

Try printing the group(1) in re.search instead of group(). Where group() prints the whole match but group(1) prints only the captured group 1( printig chars which was present inside the group index 1 ).

>>> a="(021)66664324-01"
>>> import re
>>> b2=re.search('\)(.*)-',a).group(1)
>>> b2
'66664324'
>>> b2=re.search('\)(.*)-',a).group()
>>> b2
')66664324-'

But re.findall gives the first preference to groups rather than the match and also it returns the results in lists but search didn't. So that this b1=re.findall('\\)(.*)-',a) gives you the desired output. If a group is present then re.findall method would print only the groups not the match. If no groups are present, then only it prints the match.

>>> b1=re.findall('\)(.*)-',a)
>>> b1
['66664324']
>>> b1=re.findall('\).*-',a)
>>> b1
[')66664324-']

The difference is in b2.group(), which equals to b2.group(0). And based on the python regex manual

the search() method of patterns scans through the string, so the match may not start at zero in that case

So in your case the result starts at index of 1. I had have tried your code with a little modification of the search rule and the expected result is at index 1.

>>> a="(021)66664324-01"

>>> re.search('\\)([0-9]*)',a).group(1)

'66664324'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM