在和中間刮取數據 <br> 使用BeautifulSoup標記

Question

HTML如下所示：

<td>
        <font face="Arial, sans-serif" size="-1">

                    <b>Home Phone: </b>507-383-1070<br>

                    <b>Cell Phone: </b>507-383-1070<br>

                    <b>E-Mail: </b><a href=mailto:macehrhardt@gmail.com>macehrhardt@gmail.com</a><br>

        </font>
</td>

我只想抓取例如Home Phone和Cell Phone數據。 507-383-1070 。 您能幫我這個忙嗎，我將如何使用BeautifulSoup解決這個問題。 我嘗試了多種方法，但沒有找到任何方法。

Answer 1

您可以使用帶有正則表達式的soup.find_all 。

>>> soup.find_all(text=re.compile('\d+(-\d+){2}'))
['507-383-1070', '507-383-1070']

您可能要調整正則表達式，具體取決於要提取的電話號碼的格式。

Answer 2

對於您提供的HTML，可以如下提取它們：

from bs4 import BeautifulSoup

html = """<td>
        <font face="Arial, sans-serif" size="-1">
                    <b>Home Phone: </b>507-383-1070<br>
                    <b>Cell Phone: </b>507-383-1070<br>
                    <b>E-Mail: </b><a href=mailto:macehrhardt@gmail.com>macehrhardt@gmail.com</a><br>
        </font>
</td>"""

soup = BeautifulSoup(html, "html.parser")
entries = [b.next.next for b in soup.find_all('b')][:2]

print entries

給你：

[u'507-383-1070', u'507-383-1070']

在和中間刮取數據 <br> 使用BeautifulSoup標記

問題描述

2 個解決方案

解決方案1
0 已采納 2017-10-08 19:48:55

解決方案2
0 2017-10-08 20:00:08

在和中間刮取數據 <br> 使用BeautifulSoup標記

問題描述

2 個解決方案

解決方案1 0 已采納 2017-10-08 19:48:55

解決方案2 0 2017-10-08 20:00:08

解決方案1
0 已采納 2017-10-08 19:48:55

解決方案2
0 2017-10-08 20:00:08