繁体   English   中英

正则表达式查找逗号分隔的数字python

[英]Regular Expression to find comma separated numbers python

我有一些HTML,我想在其中找到包含逗号分隔数字的字符串,例如

871,174次观看(其中可能包含逗号,范围从1到n)

我尝试了很多

'(\d+(,d+)*)\sViews'

但不能使它工作,因为当我跑步时

re.findall(r'(\d+(,d+)*)\sViews', string)

, 它给

[('174', '')]

其实我想得到这个号码。

编辑1:这是我传递给正则表达式的字符串

<span class="fcg"><span id="fbPhotoPageCreatorInfo"></span></span><div class="mbs fbPhotosAudienceContainerNotEditable" id="fbPhotoPageAudienceSelector"><span class="mrs fbPhotosAudienceNotEditable fsm fwn fcg">Shared with:</span><div class="_6a _29ee _3iio _20nn _43_1" data-hover="tooltip" aria-label="Public" data-tooltip-alignh="center"><i class="img sp_e0NUBoHLxu_ sx_9486cc"></i><span class="_29ef">Public</span></div>&nbsp;</div><div></div><span class="fcg">871,174 Views</span>

除非是错字,否则您已经省略了反斜杠:

  '(\d+)(,\d+)*\sViews'
# here __^

测试:

>>> html = """<span class="fcg">871,174 Views</span>"""
>>> import re
>>> pattern = re.compile(r'(\d+)(?:,(\d+))*\sViews')
>>> matches = re.findall(pattern, html)
>>> print(matches)
[('871', '174')]
(\d+(?:,d+)*)

试试这个。这应该适合您。

如果您不希望使用BeautifulSoup获取文本并且要使用re而不是搜索整个字符串,请在该类上rsplit,如果您担心速度会更快:

html = """<span class="fcg"><span id="fbPhotoPageCreatorInfo"></span></span><div class="mbs fbPhotosAudienceContainerNotEditable" id="fbPhotoPageAudienceSelector"><span class="mrs fbPhotosAudienceNotEditable fsm fwn fcg">Shared with:</span><div class="_6a _29ee _3iio _20nn _43_1" data-hover="tooltip" aria-label="Public" data-tooltip-alignh="center"><i class="img sp_e0NUBoHLxu_ sx_9486cc"></i><span class="_29ef">Public</span></div>&nbsp;</div><div></div><span class="fcg">871,174 Views</span>"""

import re
print(re.findall(("\d+"),html.rsplit('class="fcg">',1)[1]))
['871', '174']

In [13]: timeit re.findall(("\d+"),html.rsplit('class="fcg">',1)[1])
100000 loops, best of 3: 3.21 µs per loop

In [14]: timeit matches = re.findall(pattern, html)
10000 loops, best of 3: 20.1 µs per loop

这种中断与任何正则表达式几乎一样,因此您应该使用beautifulSoup。

import re

html = """<span class="fcg"><span id="fbPhotoPageCreatorInfo"></span></span><div class="mbs fbPhotosAudienceContainerNotEditable" id="fbPhotoPageAudienceSelector"><span class="mrs fbPhotosAudienceNotEditable fsm fwn fcg">Shared with:</span><div class="_6a _29ee _3iio _20nn _43_1" data-hover="tooltip" aria-label="Public" data-tooltip-alignh="center"><i class="img sp_e0NUBoHLxu_ sx_9486cc"></i><span class="_29ef">Public</span></div>&nbsp;</div><div></div><span class="fcg">871,174 Views</span>"""

p = re.compile(r"[\d\,]+(?=\sViews)")
print p.findall(html)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM