如何在BeautifulSoup中跳過相同的標簽-Python

Question

我目前正在為Scrapers編寫代碼，並且越來越成為Python的愛好者，尤其是BeautifulSoup。

仍然...通過html解析時，遇到了一個困難的部分，我只能以一種不太漂亮的方式使用它。

我想抓取HTML代碼，尤其是以下代碼段：

<div class="title-box">
    <h2>
        <span class="result-desc">
            Search results <strong>1</strong>-<strong>10</strong> out of <strong>10,009</strong> about <strong>paul mccartney</strong><a href="alert/settings" class="title-email-alert-promo x-title-alerts-promo">Create email Alert</a>
        </span>
    </h2>
</div>

所以我要做的是通過使用以下方法識別div：

comment = TopsySoup.find('div', attrs={'class' : 'title-box'})

然后是丑陋的部分。要捕獲我想要的數字：10,009我使用：

catcher = comment.strong.next.next.next.next.next.next.next

有人可以告訴我是否有更好的方法嗎？

Answer 1

怎么樣comment.find_all('strong')[2].text呢？

實際上，可以將其縮寫為comment('strong')[2].text ，因為將Tag對象當作函數來調用與對它調用find_all相同。

>>> comment('strong')[2].text
u'10,009'

如何在BeautifulSoup中跳過相同的標簽-Python

問題描述

1 個解決方案

解決方案1
3 2013-05-23 13:16:05

如何在BeautifulSoup中跳過相同的標簽-Python

問題描述

1 個解決方案

解決方案1 3 2013-05-23 13:16:05

解決方案1
3 2013-05-23 13:16:05