简体   繁体   English

如何在Python中使用RegEx获得匹配字符串的一部分?

[英]How could I get a part of a match string by RegEx in Python?

I'm now making a web-spider by python,and some part of the program requests me to get some strings like data-id="48859672" from a website. 我现在正在用python制作网络蜘蛛,程序的某些部分要求我从网站上获取一些字符串,例如data-id =“ 48859672”。 I've successfully got these strings using: 我已经成功使用以下字符串:

pattern=re.compile(r'\bdata-id="\d+"')
m=pattern.search(html,start)

But I'm now wondering how to only get the number part of the strings,except the whole string? 但是我现在想知道如何只获取字符串的数字部分,而不是整个字符串?

Use capturing group or lookarounds . 使用捕获组环顾四周

>>> pattern=re.compile(r'\bdata-id="(\d+)"')
>>> s = 'data-id="48859672"'
>>> pattern.search(s).group(1)
'48859672'

OR 要么

>>> pattern=re.compile(r'(?<=\bdata-id=")\d+(?=")')
>>> s = 'data-id="48859672"'
>>> pattern.search(s).group()
'48859672'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM