Python / Selenium / Beautiful Soup not scraping desired elements

Question

I'm struggling to get this code to extract the desired information from one single page.

I've tried all the usual selenium tactics and added a time delay. Hopefully, it's something simple. I'm not getting any error messages.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup as bs
from time import sleep

options = Options()
options.add_argument("--headless")
options.add_argument("window-size=1400,600")
user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_0_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36"
options.add_argument(f'user-agent={user_agent}')
capabilities = { 'chromeOptions':  { 'useAutomationExtension': False},'args': ['--disable-extensions']}
browser = webdriver.Chrome(executable_path=r'/usr/local/bin/chromedriver',desired_capabilities = capabilities,options=options)

url='https://groceries.asda.com/product/celery-spring-onions/asda-growers-selection-trimmed-spring-onions/41676'

browser.get(url)
sleep(3)
source_data = browser.page_source
bs_data = bs(source_data,"html.parser")

#product id
try:
    product_id = bs_data.findfindAll('span', {'class': 'pdp-main-details__product-code'})       
    product_id = product_id.replace('Product code:','').strip()
except:
    product_id = "n/a"

#image address 
try:
    for image in bs_data.find("div", {"class":"s7staticimage"}):
        image_url = image.find('img')['src']
except:
       image_url = "n/a"   

#product description
try:
    product_desc = bs_data.find('class',{'pdp-main-pdp-main-details__title'})
    product_desc = product_desc.get_text().strip()
except:
    product_desc = "n/a"

#product price
try:
    product_price = bs_data.find('class',{'co-product__price pdp-main-details__price'})
    product_price = product_price.get_text().strip()
except:
    product_price = "n/a"

print (url,'|',image_url,'|',product_id,'|',product_desc,'|',product_price)        


browser.quit()

Any assistance is greatly appreciated.

Thanks

Answer 1

Since the content is dynamically generated, your soup has nothing in it to find. Selenium is good enough. I don't know why you have treated the elements as list because there is only one of each on this page.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--headless')
capabilities = { 'chromeOptions':  { 'useAutomationExtension': False},'args': ['--disable-extensions']}
browser = webdriver.Chrome(executable_path='C:/bin/chromedriver.exe',desired_capabilities = capabilities,options=options)
url='https://groceries.asda.com/product/celery-spring-onions/asda-growers-selection-trimmed-spring-onions/41676'

browser.get(url)
browser.implicitly_wait(15)
product_id = browser.find_element_by_class_name('pdp-main-details__product-code')
print(product_id.text)
image = browser.find_element_by_xpath("//*[@id=\"s7viewer_flyout\"]/div[1]/img[1]")
image_url = image.get_attribute('src')
print(image_url)

Output:-

Product code: 410212
https://ui.assets-asda.com/dm/asdagroceries/5050854288142_T1?defaultImage=asdagroceries/noImage&resMode=sharp2&id=PqaST3&fmt=jpg&fit=constrain,1&wid=188&hei=188

Python / Selenium / Beautiful Soup not scraping desired elements

Question

1 answers

solution1
0 2020-11-28 18:58:18

Python / Selenium / Beautiful Soup not scraping desired elements

Question

1 answers

solution1 0 2020-11-28 18:58:18

solution1
0 2020-11-28 18:58:18