繁体   English   中英

Python / Selenium / 美丽的汤没有刮到所需的元素

[英]Python / Selenium / Beautiful Soup not scraping desired elements

我正在努力获取此代码以从一个页面中提取所需的信息。

我已经尝试了所有常用的 selenium 策略并添加了时间延迟。 希望这很简单。 我没有收到任何错误消息。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup as bs
from time import sleep

options = Options()
options.add_argument("--headless")
options.add_argument("window-size=1400,600")
user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_0_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36"
options.add_argument(f'user-agent={user_agent}')
capabilities = { 'chromeOptions':  { 'useAutomationExtension': False},'args': ['--disable-extensions']}
browser = webdriver.Chrome(executable_path=r'/usr/local/bin/chromedriver',desired_capabilities = capabilities,options=options)

url='https://groceries.asda.com/product/celery-spring-onions/asda-growers-selection-trimmed-spring-onions/41676'

browser.get(url)
sleep(3)
source_data = browser.page_source
bs_data = bs(source_data,"html.parser")

#product id
try:
    product_id = bs_data.findfindAll('span', {'class': 'pdp-main-details__product-code'})       
    product_id = product_id.replace('Product code:','').strip()
except:
    product_id = "n/a"

#image address 
try:
    for image in bs_data.find("div", {"class":"s7staticimage"}):
        image_url = image.find('img')['src']
except:
       image_url = "n/a"   

#product description
try:
    product_desc = bs_data.find('class',{'pdp-main-pdp-main-details__title'})
    product_desc = product_desc.get_text().strip()
except:
    product_desc = "n/a"

#product price
try:
    product_price = bs_data.find('class',{'co-product__price pdp-main-details__price'})
    product_price = product_price.get_text().strip()
except:
    product_price = "n/a"

print (url,'|',image_url,'|',product_id,'|',product_desc,'|',product_price)        


browser.quit()

非常感谢任何帮助。

谢谢

由于内容是动态生成的,因此您的soup中没有任何内容可查找。 Selenium够用了。 我不知道您为什么将元素视为列表,因为此页面上每个元素只有一个。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--headless')
capabilities = { 'chromeOptions':  { 'useAutomationExtension': False},'args': ['--disable-extensions']}
browser = webdriver.Chrome(executable_path='C:/bin/chromedriver.exe',desired_capabilities = capabilities,options=options)
url='https://groceries.asda.com/product/celery-spring-onions/asda-growers-selection-trimmed-spring-onions/41676'

browser.get(url)
browser.implicitly_wait(15)
product_id = browser.find_element_by_class_name('pdp-main-details__product-code')
print(product_id.text)
image = browser.find_element_by_xpath("//*[@id=\"s7viewer_flyout\"]/div[1]/img[1]")
image_url = image.get_attribute('src')
print(image_url)

Output:-

Product code: 410212
https://ui.assets-asda.com/dm/asdagroceries/5050854288142_T1?defaultImage=asdagroceries/noImage&resMode=sharp2&id=PqaST3&fmt=jpg&fit=constrain,1&wid=188&hei=188

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM