![](/img/trans.png)
[英]Web Scraping with Beautiful Soup for Content of Specific Text within a Specific Tag
[英]Beautiful soup extraction of text within span tag
我正在嘗試從以下 HTML 中提取文本Weight: 16.5 pounds
:
<div class="product__description__text">.........
<p dir="ltr"><span><strong>Dimensions:</strong> 39 x 17.3 x 32.2 inches</span></p><p dir="ltr"><span><strong>Weight:</strong> 16.5 pounds</span></p><p dir="ltr"><span><strong>Weight limit:</strong> 35 pounds</span></p><p dir="ltr"><span><strong>Height limit:</strong> 32 inches</span></p></div>
這是我到目前為止所嘗試的:
results = soup.find_all('div', attrs={'class':'product'})
Weight_L = []
for result in results:
if result.find('p', attrs={'dir':'ltr'})is not None:
weight = result.span.text
Weight_L.append(weight)
如果您只查找weight
,我建議您只檢查關鍵字“weight”是否在p
標簽中。 此外,如果您使用find
,它只會返回第一個結果 - 所以如果第一個p
標簽不是“Weight”,您將無法找到它。 此外,如果您的 class 名稱是product__description__text
,您還應該將您的發現 class 名稱更改為product__description__text
。
results = soup.find_all('div', attrs={'class':'product__description__text'})
Weight_L = []
for result in results:
p_tags = result.find_all('p', attrs={'dir':'ltr'})
for tag in p_tags:
if "Weight:" in tag.text:
weight = tag.text
Weight_L.append(weight)
如果您發布的上述代碼是soup
,結果將是: ['Weight: 16.5 pounds']
Weight: 16.5 pounds
是來自父 class .product__description__text
的第二個p
標簽,您可以使用p:nth-child(2)
獲得第二個p
results = soup.select(".product__description__text p:nth-child(2)")
Weight_L = []
for result in results:
Weight_L.append(result.text)
print(result.text)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.