正則表達式查找逗號分隔的數字python

Question

我有一些HTML，我想在其中找到包含逗號分隔數字的字符串，例如

871,174次觀看（其中可能包含逗號，范圍從1到n）

我嘗試了很多

'(\d+(,d+)*)\sViews'

但不能使它工作，因為當我跑步時

re.findall(r'(\d+(,d+)*)\sViews', string)

，它給

[('174', '')]

其實我想得到這個號碼。

編輯1：這是我傳遞給正則表達式的字符串

<span class="fcg"><span id="fbPhotoPageCreatorInfo"></span></span><div class="mbs fbPhotosAudienceContainerNotEditable" id="fbPhotoPageAudienceSelector"><span class="mrs fbPhotosAudienceNotEditable fsm fwn fcg">Shared with:</span><div class="_6a _29ee _3iio _20nn _43_1" data-hover="tooltip" aria-label="Public" data-tooltip-alignh="center"><i class="img sp_e0NUBoHLxu_ sx_9486cc"></i><span class="_29ef">Public</span></div>&nbsp;</div><div></div><span class="fcg">871,174 Views</span>

Answer 1

除非是錯字，否則您已經省略了反斜杠：

  '(\d+)(,\d+)*\sViews'
# here __^

測試：

>>> html = """<span class="fcg">871,174 Views</span>"""
>>> import re
>>> pattern = re.compile(r'(\d+)(?:,(\d+))*\sViews')
>>> matches = re.findall(pattern, html)
>>> print(matches)
[('871', '174')]

Answer 2

(\d+(?:,d+)*)

試試這個。這應該適合您。

Answer 3

如果您不希望使用BeautifulSoup獲取文本並且要使用re而不是搜索整個字符串，請在該類上rsplit，如果您擔心速度會更快：

html = """<span class="fcg"><span id="fbPhotoPageCreatorInfo"></span></span><div class="mbs fbPhotosAudienceContainerNotEditable" id="fbPhotoPageAudienceSelector"><span class="mrs fbPhotosAudienceNotEditable fsm fwn fcg">Shared with:</span><div class="_6a _29ee _3iio _20nn _43_1" data-hover="tooltip" aria-label="Public" data-tooltip-alignh="center"><i class="img sp_e0NUBoHLxu_ sx_9486cc"></i><span class="_29ef">Public</span></div>&nbsp;</div><div></div><span class="fcg">871,174 Views</span>"""

import re
print(re.findall(("\d+"),html.rsplit('class="fcg">',1)[1]))
['871', '174']

In [13]: timeit re.findall(("\d+"),html.rsplit('class="fcg">',1)[1])
100000 loops, best of 3: 3.21 µs per loop

In [14]: timeit matches = re.findall(pattern, html)
10000 loops, best of 3: 20.1 µs per loop

這種中斷與任何正則表達式幾乎一樣，因此您應該使用beautifulSoup。

Answer 4

import re

html = """<span class="fcg"><span id="fbPhotoPageCreatorInfo"></span></span><div class="mbs fbPhotosAudienceContainerNotEditable" id="fbPhotoPageAudienceSelector"><span class="mrs fbPhotosAudienceNotEditable fsm fwn fcg">Shared with:</span><div class="_6a _29ee _3iio _20nn _43_1" data-hover="tooltip" aria-label="Public" data-tooltip-alignh="center"><i class="img sp_e0NUBoHLxu_ sx_9486cc"></i><span class="_29ef">Public</span></div>&nbsp;</div><div></div><span class="fcg">871,174 Views</span>"""

p = re.compile(r"[\d\,]+(?=\sViews)")
print p.findall(html)

正則表達式查找逗號分隔的數字python

問題描述

4 個解決方案

解決方案1
2 2015-01-23 10:23:16

解決方案2
0 2015-01-23 10:09:19

解決方案3
0 2015-01-23 10:38:07

解決方案4
0 2015-01-25 04:05:19

正則表達式查找逗號分隔的數字python

問題描述

4 個解決方案

解決方案1 2 2015-01-23 10:23:16

解決方案2 0 2015-01-23 10:09:19

解決方案3 0 2015-01-23 10:38:07

解決方案4 0 2015-01-25 04:05:19

解決方案1
2 2015-01-23 10:23:16

解決方案2
0 2015-01-23 10:09:19

解決方案3
0 2015-01-23 10:38:07

解決方案4
0 2015-01-25 04:05:19