如何循環通過div類以訪問li類？

Question

我刮了一頁，發現用我的xpath和regex方法我似乎無法獲得div類中的一組值

我已經嘗試過此頁上所述的方法如何在div標簽中獲取所有li標簽，然后在文件中顯示下面顯示的當前邏輯

    #PRODUCT ATTRIBUTES (STYLE, SKU, BRAND)     need to figure out how to loop thru a class and pull out the 2 list tags
prodattr = re.compile(r'<div class=\"pdp-desc-attr spec-prod-attr\">([^<]+)</div>', re.IGNORECASE)
prodattrmatches = re.findall(prodattr, html)
for m in prodattrmatches:
        m = re.compile(r'<li class=\"last last-item\">([^<]+)</li>', re.IGNORECASE)
        stymatches = re.findall(m, html)

#STYLE
sty = re.compile(r'<li class=\"last last-item\">([^<]+)</li>', re.IGNORECASE)
stymatches = re.findall(sty, html)

#BRAND
brd = re.compile(r'<li class=\"first first-item\">([^<]+)</li>', re.IGNORECASE)   
brdmatches = re.findall(brd, html)

上面是當前無法正常工作的代碼。 為了我的測試目的，我只是將數據（如果有的話）寫到打印命令中，以便我可以在控制台上看到它。

    itmDetails2 = dets['sku'] +","+ dets['description']+","+ dets['price']+","+ dets['brand']

在控制台中，這就是我得到的，這是我期望的，並且通用消息只是占位符，直到我弄清楚了這個邏輯。

SKUE GOES HERE,adidas Women's Essentials Tricot Track Jacket,34.97, BRAND GOES HERE

<div class="pdp-desc-attr spec-prod-attr">
    <ul class="prod-attr-list">
        <li class="first first-item">Brand: adidas</li>
        <li>Country of Origin: Imported</li>
        <li class="last last-item">Style: F18AAW400D</li>   
    </ul>
</div>

Answer 1

不要使用Regex解析HTML

有更好，更安全的方法來執行此操作。

使用Parsel和BeautifulSoup來查看以下代碼，以提取示例代碼的li標記：

from parsel import Selector
from bs4 import BeautifulSoup

html = ('<div class="pdp-desc-attr spec-prod-attr">'
           '<ul class="prod-attr-list">'
             '<li class="first first-item">Brand: adidas</li>'
             '<li>Country of Origin: Imported</li>'
             '<li class="last last-item">Style: F18AAW400D</li>'
           '</ul>'
         '</div>')

# Using parsel
sel = Selector(text=html)

for li in sel.xpath('//li'):
    print(li.xpath('./text()').get())

# Using BeautifulSoup
soup = BeautifulSoup(html, "html.parser")

for li in soup.find_all('li'):
    print(li.text)

輸出：

Brand: adidas
Country of Origin: Imported
Style: F18AAW400D
Brand: adidas
Country of Origin: Imported
Style: F18AAW400D

Answer 2

我將使用html解析器並查找ul的類。 使用bs4 4.7.1

from bs4 import BeautifulSoup as bs

html = '''
<div class="pdp-desc-attr spec-prod-attr">
    <ul class="prod-attr-list">
        <li class="first first-item">Brand: adidas</li>
        <li>Country of Origin: Imported</li>
        <li class="last last-item">Style: F18AAW400D</li>   
    </ul>
</div>
'''

soup = bs(html, 'lxml')

for item in soup.select('.prod-attr-list:has(> li)'):
    print([sub_item.text for sub_item in item.select('li')])

如何循環通過div類以訪問li類？

問題描述

2 個解決方案

解決方案1
3 2019-04-18 19:59:04

解決方案2
1 2019-04-18 21:42:03

如何循環通過div類以訪問li類？

問題描述

2 個解決方案

解決方案1 3 2019-04-18 19:59:04

解決方案2 1 2019-04-18 21:42:03

解決方案1
3 2019-04-18 19:59:04

解決方案2
1 2019-04-18 21:42:03