使用帶有BeutifulSoup的CSS選擇器獲取屬性值

Question

我正在使用Python進行網絡BeutifulSoup並使用BeutifulSoup庫

我有這樣的HTML標記：

<tr class="deals" data-url="www.example2.com">
<span class="hotel-name">
<a href="www.example2.com"></a>
</span>
</tr>
<tr class="deals" data-url="www.example3.com">
<span class="hotel-name">
<a href="www.example3.com"></a>
</span>
</tr>

我想在所有<tr>獲取data-url或href值。 如果我能獲得href價值，那就更好了

這是我相關代碼的一小段：

main_url =  "http://localhost/test.htm"
page  = requests.get(main_url).text
soup_expatistan = BeautifulSoup(page)

print (soup_expatistan.select("tr.deals").data-url)
# or  print (soup_expatistan.select("tr.deals").["data-url"])

Answer 1

您可以使用CSS選擇器tr.deals span.hotel-name a進入鏈接：

from bs4 import BeautifulSoup

data = """
<tr class="deals" data-url="www.example.com">
<span class="hotel-name">
<a href="wwwexample2.com"></a>
</span>
</tr>
"""

soup = BeautifulSoup(data)
print(soup.select('tr.deals span.hotel-name a')[0]['href'])

印刷品：

wwwexample2.com

如果您有多個鏈接，請對其進行迭代：

for link in soup.select('tr.deals span.hotel-name a'):
    print(link['href'])

使用帶有BeutifulSoup的CSS選擇器獲取屬性值

問題描述

1 個解決方案

解決方案1
3 已采納 2014-11-07 14:22:36

使用帶有BeutifulSoup的CSS選擇器獲取屬性值

問題描述

1 個解決方案

解決方案1 3 已采納 2014-11-07 14:22:36

解決方案1
3 已采納 2014-11-07 14:22:36