简体   繁体   中英

Python regex multiple matches with grouping

Input String

<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>

What I want

(1111,2222)

If I use findall, this is what I get :

>>> import re;
>>> print re.findall("<(msgCode|errorId)>([0-9]+)</(msgCode|errorId)>","<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>");
[('msgCode', '1111', 'msgCode'), ('errorId', '2222', 'errorId')]

What I hope for is

[('1111','2222')]

Is there a easy way to do it using re instead of post-processing output ?

consider using xpath instead:

>>> from lxml import html
>>> root = html.fromstring('<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>')
>>> root.xpath('//*[self::msgcode or self::errorid]/text()')
['1111', '2222']

Use a Non-Capture group for the msgCode tags (?:msgCode|errorId)

>> import re
>> subject = "<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>"
>> result = re.findall("<(?:msgCode|errorId)>([0-9]+)</(?:msgCode|errorId)>", subject)
>> print result

['1111', '2222']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM