Input String
<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>
What I want
(1111,2222)
If I use findall, this is what I get :
>>> import re;
>>> print re.findall("<(msgCode|errorId)>([0-9]+)</(msgCode|errorId)>","<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>");
[('msgCode', '1111', 'msgCode'), ('errorId', '2222', 'errorId')]
What I hope for is
[('1111','2222')]
Is there a easy way to do it using re instead of post-processing output ?
consider using xpath instead:
>>> from lxml import html
>>> root = html.fromstring('<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>')
>>> root.xpath('//*[self::msgcode or self::errorid]/text()')
['1111', '2222']
Use a Non-Capture group for the msgCode tags (?:msgCode|errorId)
>> import re
>> subject = "<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>"
>> result = re.findall("<(?:msgCode|errorId)>([0-9]+)</(?:msgCode|errorId)>", subject)
>> print result
['1111', '2222']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.