![](/img/trans.png)
[英]How to find next number to matched number from string of comma separated numbers using regular expression in Python?
[英]Regular Expression to find comma separated numbers python
我有一些HTML,我想在其中找到包含逗號分隔數字的字符串,例如
871,174次觀看(其中可能包含逗號,范圍從1到n)
我嘗試了很多
'(\d+(,d+)*)\sViews'
但不能使它工作,因為當我跑步時
re.findall(r'(\d+(,d+)*)\sViews', string)
, 它給
[('174', '')]
其實我想得到這個號碼。
編輯1:這是我傳遞給正則表達式的字符串
<span class="fcg"><span id="fbPhotoPageCreatorInfo"></span></span><div class="mbs fbPhotosAudienceContainerNotEditable" id="fbPhotoPageAudienceSelector"><span class="mrs fbPhotosAudienceNotEditable fsm fwn fcg">Shared with:</span><div class="_6a _29ee _3iio _20nn _43_1" data-hover="tooltip" aria-label="Public" data-tooltip-alignh="center"><i class="img sp_e0NUBoHLxu_ sx_9486cc"></i><span class="_29ef">Public</span></div> </div><div></div><span class="fcg">871,174 Views</span>
除非是錯字,否則您已經省略了反斜杠:
'(\d+)(,\d+)*\sViews'
# here __^
測試:
>>> html = """<span class="fcg">871,174 Views</span>"""
>>> import re
>>> pattern = re.compile(r'(\d+)(?:,(\d+))*\sViews')
>>> matches = re.findall(pattern, html)
>>> print(matches)
[('871', '174')]
(\d+(?:,d+)*)
試試這個。這應該適合您。
如果您不希望使用BeautifulSoup獲取文本並且要使用re而不是搜索整個字符串,請在該類上rsplit,如果您擔心速度會更快:
html = """<span class="fcg"><span id="fbPhotoPageCreatorInfo"></span></span><div class="mbs fbPhotosAudienceContainerNotEditable" id="fbPhotoPageAudienceSelector"><span class="mrs fbPhotosAudienceNotEditable fsm fwn fcg">Shared with:</span><div class="_6a _29ee _3iio _20nn _43_1" data-hover="tooltip" aria-label="Public" data-tooltip-alignh="center"><i class="img sp_e0NUBoHLxu_ sx_9486cc"></i><span class="_29ef">Public</span></div> </div><div></div><span class="fcg">871,174 Views</span>"""
import re
print(re.findall(("\d+"),html.rsplit('class="fcg">',1)[1]))
['871', '174']
In [13]: timeit re.findall(("\d+"),html.rsplit('class="fcg">',1)[1])
100000 loops, best of 3: 3.21 µs per loop
In [14]: timeit matches = re.findall(pattern, html)
10000 loops, best of 3: 20.1 µs per loop
這種中斷與任何正則表達式幾乎一樣,因此您應該使用beautifulSoup。
import re
html = """<span class="fcg"><span id="fbPhotoPageCreatorInfo"></span></span><div class="mbs fbPhotosAudienceContainerNotEditable" id="fbPhotoPageAudienceSelector"><span class="mrs fbPhotosAudienceNotEditable fsm fwn fcg">Shared with:</span><div class="_6a _29ee _3iio _20nn _43_1" data-hover="tooltip" aria-label="Public" data-tooltip-alignh="center"><i class="img sp_e0NUBoHLxu_ sx_9486cc"></i><span class="_29ef">Public</span></div> </div><div></div><span class="fcg">871,174 Views</span>"""
p = re.compile(r"[\d\,]+(?=\sViews)")
print p.findall(html)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.