[英]Beautiful soup getting all <li> tags after specific <br> tag
我正在嘗試獲取一個類別下所有產品的所有關鍵成分, 這個 web 頁面顯示了產品成分是如何列出的,正如您在下面的頁面源截圖中看到的那樣,所有成分都在 br 標記之后,值為“Key成分:“下面是我的代碼,我可以得到所有的文本,但我怎樣才能得到所有的
預期 output:
Glycerin Sodium Palmate Sodium Palm Kemelate Cymbopogon Flexuosus Oil Linalool Coumarin Benzyl Salicylate Citral
代碼:
from os.path import basename import requests from bs4 import BeautifulSoup baseurl = "https://www.1mg.com/otc/dettol-original-bathing-soap-bar-125gm-each-buy-4-get-1-free-otc587797" header = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ' 'Chrome/74.0.3729.169 Safari/537.36 ' } r = requests.get(baseurl, headers=header) # soup = BeautifulSoup(r, 'lxml') soup = BeautifulSoup(r.content, "html.parser") job_element = soup.find("div", class_="otc-container") categories = job_element.findAll("a", class_="button-text Breadcrumbs__breadcrumb___XuCvk", href=True) # print(categories) description = job_element.find("div", class_="ProductDescription__description-content___A_qCZ") print(description.text)
一種方法可能是隔離關鍵成分標記 - 移動到下一個標記 ( <br>
) - 然后處理所有后續標記,直到到達下一個<br>
標記。
<strong>Key Ingredients:</strong>
<br>
<ul>
...
...
<ul>
<br>
代碼:
>>> for tag in description.find(string='Key Ingredients:').find_next('br').next_elements:
... if tag.name == 'br': break
... if tag.name == 'li': tag.get_text()
'Glycerin'
'Sodium Palmate'
'Sodium Palm Kemelate'
'Cymbopogon Flexuosus Oil'
'Linalool'
'Coumarin'
'Benzyl Salicylate'
'Citral'
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.