简体   繁体   中英

How could I get a part of a match string by RegEx in Python?

I'm now making a web-spider by python,and some part of the program requests me to get some strings like data-id="48859672" from a website. I've successfully got these strings using:

pattern=re.compile(r'\bdata-id="\d+"')
m=pattern.search(html,start)

But I'm now wondering how to only get the number part of the strings,except the whole string?

Use capturing group or lookarounds .

>>> pattern=re.compile(r'\bdata-id="(\d+)"')
>>> s = 'data-id="48859672"'
>>> pattern.search(s).group(1)
'48859672'

OR

>>> pattern=re.compile(r'(?<=\bdata-id=")\d+(?=")')
>>> s = 'data-id="48859672"'
>>> pattern.search(s).group()
'48859672'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM