[英]How to loop thru a div class to get access to the li class within?
我刮了一頁,發現用我的xpath和regex方法我似乎無法獲得div類中的一組值
我已經嘗試過此頁上所述的方法如何在div標簽中獲取所有li標簽 ,然后在文件中顯示下面顯示的當前邏輯
#PRODUCT ATTRIBUTES (STYLE, SKU, BRAND) need to figure out how to loop thru a class and pull out the 2 list tags
prodattr = re.compile(r'<div class=\"pdp-desc-attr spec-prod-attr\">([^<]+)</div>', re.IGNORECASE)
prodattrmatches = re.findall(prodattr, html)
for m in prodattrmatches:
m = re.compile(r'<li class=\"last last-item\">([^<]+)</li>', re.IGNORECASE)
stymatches = re.findall(m, html)
#STYLE
sty = re.compile(r'<li class=\"last last-item\">([^<]+)</li>', re.IGNORECASE)
stymatches = re.findall(sty, html)
#BRAND
brd = re.compile(r'<li class=\"first first-item\">([^<]+)</li>', re.IGNORECASE)
brdmatches = re.findall(brd, html)
上面是當前無法正常工作的代碼。 為了我的測試目的,我只是將數據(如果有的話)寫到打印命令中,以便我可以在控制台上看到它。
itmDetails2 = dets['sku'] +","+ dets['description']+","+ dets['price']+","+ dets['brand']
在控制台中,這就是我得到的,這是我期望的,並且通用消息只是占位符,直到我弄清楚了這個邏輯。
SKUE GOES HERE,adidas Women's Essentials Tricot Track Jacket,34.97, BRAND GOES HERE
<div class="pdp-desc-attr spec-prod-attr">
<ul class="prod-attr-list">
<li class="first first-item">Brand: adidas</li>
<li>Country of Origin: Imported</li>
<li class="last last-item">Style: F18AAW400D</li>
</ul>
</div>
有更好,更安全的方法來執行此操作。
使用Parsel和BeautifulSoup來查看以下代碼,以提取示例代碼的li
標記:
from parsel import Selector
from bs4 import BeautifulSoup
html = ('<div class="pdp-desc-attr spec-prod-attr">'
'<ul class="prod-attr-list">'
'<li class="first first-item">Brand: adidas</li>'
'<li>Country of Origin: Imported</li>'
'<li class="last last-item">Style: F18AAW400D</li>'
'</ul>'
'</div>')
# Using parsel
sel = Selector(text=html)
for li in sel.xpath('//li'):
print(li.xpath('./text()').get())
# Using BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
for li in soup.find_all('li'):
print(li.text)
輸出:
Brand: adidas
Country of Origin: Imported
Style: F18AAW400D
Brand: adidas
Country of Origin: Imported
Style: F18AAW400D
我將使用html解析器並查找ul
的類。 使用bs4 4.7.1
from bs4 import BeautifulSoup as bs
html = '''
<div class="pdp-desc-attr spec-prod-attr">
<ul class="prod-attr-list">
<li class="first first-item">Brand: adidas</li>
<li>Country of Origin: Imported</li>
<li class="last last-item">Style: F18AAW400D</li>
</ul>
</div>
'''
soup = bs(html, 'lxml')
for item in soup.select('.prod-attr-list:has(> li)'):
print([sub_item.text for sub_item in item.select('li')])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.