如何在“美丽的汤”中下多个层次（find_all错误）

Question

我正在尝试在此Python脚本中深入研究两个级别。 我看到的所有示例都使用find_all向下钻取一个级别，我可以正常工作，但是我无法下降到第三级。 这是我的代码段：

main_table = soup.find("div",attrs={'class':'block-content'})
label_item_contents = main_table.find_all("div", attrs={'class':'label-item-description'})
links = label_item_contents.find_all("a")
print(links)

这样做会给我错误“ AttributeError：ResultSet对象没有属性'find_all'”。

如果我注释掉并更改打印，就这样：

main_table = soup.find("div",attrs={'class':'block-content'})
label_item_contents = main_table.find_all("div", attrs={'class':'label-item-description'})
print(label_item_contents)

然后，我看到所有已抓取的数据。 我读到问题可能是label_item_contents变成了数组，所以我尝试这样做：

links = label_item_contents[].find_all("a")

但是然后我得到“ SyntaxError：无效的语法”

任何帮助表示赞赏！

编辑：这是当我使用print（label_item_contents）的第二个示例中返回的HTML的一部分：

<div class="label-item-description">
    <div>
        <a href="/label/example.com"><strong>Example</strong></a>
    </div>
    <small>
        <i class="fa fa-facebook-official"></i> 342.4K
        <i class="fa fa-soundcloud"></i> 233.4K
    </small>
    <br />
    <small class="text-muted">
        Stockholm, Sweden
    </small>
    <br />
    <small class="text-muted">
        <b>Techno, Tech House</b>
    </small>
</div>, <div class="label-item-description">

我正在尝试仅访问<a href="/label/example.com">

Answer 1

您可能想尝试一下CSS选择器-我发现它们更加熟悉，而且重要的是，我发现它们引起的AttributeError问题更少。

例如，使用上述html您可以选择第一个锚标记，如下所示：

link = soup.select("div.label-item-description > div > a")
print(link[0]) # <a href="/label/example.com"><strong>Example</strong></a>

参见文档：

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors

Answer 2

不确定是否正确，但是为什么不这样做呢？ 您可以链式find到所需的细分：

html= """
<div class="block-content">
    <div class="label-item-description">
        <div>
            <a href="/label/example.com"><strong>Example</strong></a>
        </div>
        <small>
            <i class="fa fa-facebook-official"></i> 342.4K
            <i class="fa fa-soundcloud"></i> 233.4K
        </small>
        <br />
        <small class="text-muted">
            Stockholm, Sweden
        </small>
        <br />
        <small class="text-muted">
            <b>Techno, Tech House</b>
        </small>
    </div>, <div class="label-item-description"></div>
</div>  """

soup=BeautifulSoup(html)
print(soup.find('div', {'class': 'block-content'}).find('div',  {'class':"label-item-description"}).find('a'))

输出：

<a href="/label/example.com"><strong>Example</strong></a>

Answer 3

有时我们使用锚标记，但其中不包含href属性。

您可以尝试使用find_all函数，它始终返回一个列表，并使用带有定位标记的href=True属性将为您提供所有具有href属性的链接。

main_table = soup.find("div",{'class':'label-item-description'})
links = main_table.find_all("a",href=True)
print(links)

Answer 4

您可能可以使用其他两种选择：

links = [item['href'] for item in soup.select('.label-item-description a')]
links2 = [item['href'] for item in soup.select('.label-item-description [href^="/label/"]')]

如何在“美丽的汤”中下多个层次（find_all错误）

问题描述

4 个解决方案

解决方案1
2 已采纳 2019-06-14 23:41:23

解决方案2
0 2019-06-14 23:28:54

解决方案3
0 2019-06-15 04:53:12

解决方案4
0 2019-06-15 05:05:19

如何在“美丽的汤”中下多个层次（find_all错误）

问题描述

4 个解决方案

解决方案1 2 已采纳 2019-06-14 23:41:23

解决方案2 0 2019-06-14 23:28:54

解决方案3 0 2019-06-15 04:53:12

解决方案4 0 2019-06-15 05:05:19

解决方案1
2 已采纳 2019-06-14 23:41:23

解决方案2
0 2019-06-14 23:28:54

解决方案3
0 2019-06-15 04:53:12

解决方案4
0 2019-06-15 05:05:19