簡體   English   中英

BeautifulSoup使用其名稱和ID“抓取”

[英]BeautifulSoup “Scraping” using their name and their id

我正在使用beautifulsoup,但不確定如何正確使用find,findall和其他功能...

如果我有:

<div class="hey"></div>

使用: soup.find_all("div", class_="hey")

會正確找到有問題的div,但是我不知道如何針對以下情況:

<h3 id="me"></h3> # Find this one via "h3" and "id"

<li id="test1"></li># Find this one via "li" and "id"

<li custom="test2321"></li># Find this one via "li" and "custom"

<li id="test1" class="tester"></li> # Find this one via "li" and "class"

<ul class="here"></ul> # Find this one via "ul" and "class"

任何想法將不勝感激:)

看下面的代碼:

from bs4 import BeautifulSoup

html = """
<h3 id="me"></h3>
<li id="test1"></li>
<li custom="test2321"></li>
<li id="test1" class="tester"></li>
<ul class="here"></ul>
"""

soup = BeautifulSoup(html)

# This tells BS to look at all the h3 tags, and find the ones that have an ID of me
# This however should not be done because IDs are supposed to be unique, so
# soup.find_all(id="me") should be used
one = soup.find_all("h3", {"id": "me"})
print one

# Same as above, if something has an ID, just use the ID
two = soup.find_all("li", {"id": "test1"})  # ids should be unique
print two

# Tells BS to look at all the li tags and find the node with a custom attribute
three = soup.find_all("li", {"custom": "test2321"})
print three

# Again ID, should have been enough
four = soup.find_all("li", {"id": "test1", "class": "tester"})
print four

# Look at ul tags, and find the one with a class attribute of "here"
four = soup.find_all("ul", {"class": "here"})
print four

輸出:

[<h3 id="me"></h3>]
[<li id="test1"></li>, <li class="tester" id="test1"></li>]
[<li custom="test2321"></li>]
[<li class="tester" id="test1"></li>]
[<ul class="here"></ul>]

應該提供所需的文檔。

來自幫助:

In [30]: soup.find_all?
Type:       instancemethod
String Form:
<bound method BeautifulSoup.find_all 
File:       /usr/lib/python2.7/site-packages/bs4/element.py
Definition: soup.find_all(self, name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs)
Docstring:
Extracts a list of Tag objects that match the given
criteria.  You can specify the name of the Tag and any
attributes you want the Tag to have.

The value of a key-value pair in the 'attrs' map can be a
string, a list of strings, a regular expression object, or a
callable that takes a string and returns whether or not the
string matches for some custom definition of 'matches'. The
same is true of the tag name.

因此,您可以將屬性作為字典傳遞,也可以作為命名參數傳遞:

In [31]: soup.find_all("li", custom="test2321")
Out[31]: [<li custom="test2321"></li>]

In [32]: soup.find_all("li", {"id": "test1", "class": ""})
Out[32]: [<li id="test1"></li>]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM