簡體   English   中英

在電子商務網站上抓取嵌套元素

[英]Scraping nested element on e-commerce website

當我訪問特定的產品頁面時,我試圖用 Selenium 從 Target 的網站上抓取產品 img url 但沒有返回。

這是我的那部分代碼:

# ADD THE IMAGE URL
    j = 0
    found = False
    while(j < 5 and not found):
        try:
            img_panel = driver.find_element_by_class_name('slideDeckPicture')
            img_panel = img_panel.find_element_by_tag_name('img')
            img_name = img.get_attribute('alt')
            img_url = img_panel.get_attribute('src')

            # img_urls.append(img_url)
            line += ',"' + img_url + '"'
            found = True
            break
        # if it can't find the image, it probably hasn't loaded. wait and try again.
        except:
            j += 1
            time.sleep(4)
            # img_urls.append('NO URL')
            # pass
    # if we've tried 5 times add no url
    if found == False:
        line += ',NO IMG URL'

HTML截圖: 在此處輸入圖像描述

鏈接到示例產品

url列表包含您要查找的網址:

url = "https://www.target.com/p/revolution-beauty-conceal-define-concealer-0-11-fl-oz/-/A-82003638?preselect=81551727#lnk=sametab"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0"}
resp = rq.get(url, headers=headers)
soup = bs(resp.content)

divs_img = soup.find_all("div", attrs={"data-test": "product-image"})[0]
urls = [i["src"] for i in divs_img.find_all("img") if i["src"].startswith("https")]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM