Unable to parse two fields from a webpage using requests module

Question

I'm trying to scrape two fields product_title and item_code from this webpage using requests module. When I execute the script below, I always get AttributeError in place of the result as the data I'm after are not in page source.

However, I've come across several solutions in here which are able to fetch data from javascript encrypted sites even when the data are not in page source, so I suppose there should be any way to grab the two fields from the webpage using requests.

import requests
from bs4 import BeautifulSoup

link = 'https://www.sainsburys.co.uk/gol-ui/Product/persil-small---mighty-non-bio-laundry-liquid-21l-60-washes'

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
    res = s.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    product_title = soup.select_one("h1[data-test-id='pd-product-title']").get_text(strip=True)
    item_code = soup.select_one("span#productSKU").get_text(strip=True)
    print(product_title,item_code)

Expected output:

Persil Non-Bio Laundry Liquid 1.43L
Item code: 7637944

How can I fetch the two fields from that site using requests?

Answer 1

Actually the wesite calling apis, so you can use that directly to get the data

r = requests.get('https://www.sainsburys.co.uk/groceries-api/gol-services/product/v1/product?filter[product_seo_url]=gb%2Fgroceries%2Fpersil-small---mighty-non-bio-laundry-liquid-21l-60-washes&include[ASSOCIATIONS]=true&include[PRODUCT_AD]=citrus')
products = r.json()['products']

for each_product in products:
    print(f"Item code: {each_product['product_uid']}")
    print(each_product['name'])
    
    
# Item code: 7637944
# Persil Non-Bio Laundry Liquid 1.43L

Unable to parse two fields from a webpage using requests module

Question

1 answers

solution1
1 ACCPTED 2021-07-23 15:35:09

Unable to parse two fields from a webpage using requests module

Question

1 answers

solution1 1 ACCPTED 2021-07-23 15:35:09

solution1
1 ACCPTED 2021-07-23 15:35:09