字符串后的python beautifulsoup匹配正則表達式

Question

我正在使用BeautifulSoup和Python抓取網頁。 我有一個BS元素，

a = soup.find('div', class_='section lot-details')

如下所示返回一系列列表對象。

<li><strong>Location:</strong> WA - 222 Welshpool Road, Welshpool</li>
<li><strong>Deliver to:</strong> Pickup Only WA</li>

我想在每個str之后返回文本

WA - 222 Welshpool Road, Welshpool
Pickup Only WA

如何將其從BS對象中刪除？ 我不確定正則表達式，也不確定它如何與BeautifulSoup交互。

Answer 1

(?:</strong>)(.*)(?:</li>)捕獲字段\\1 (.*)將完成此工作。

Python代碼示例：

In [1]: import re
In [2]: test = re.compile(r'(?:</strong>)(.*)(?:</li>)')
In [3]: test.findall(input_string)
Out[1]: [' WA - 222 Welshpool Road, Welshpool', ' Pickup Only WA']

在這里檢查https://regex101.com/r/fD0fZ9/1

Answer 2

您真的不需要正則表達式。 如果您的li標簽在列表中：

>>> for li in li_elems:
...     print li.find('strong').next_sibling.strip()

WA - 222 Welshpool Road, Welshpool
Pickup Only WA

假設li只有一個strong元素，而之后是text。

或者，或者：

>>> for li in li_elems:
...     print li.contents[1].strip()

WA - 222 Welshpool Road, Welshpool
Pickup Only WA

字符串后的python beautifulsoup匹配正則表達式

問題描述

2 個解決方案

解決方案1
1 已采納 2016-05-19 13:29:34

解決方案2
1 2016-05-19 13:47:57

字符串后的python beautifulsoup匹配正則表達式

問題描述

2 個解決方案

解決方案1 1 已采納 2016-05-19 13:29:34

解決方案2 1 2016-05-19 13:47:57

解決方案1
1 已采納 2016-05-19 13:29:34

解決方案2
1 2016-05-19 13:47:57