I'm trying to scrape two fields product_title
and item_code
from this webpage using requests module. When I execute the script below, I always get AttributeError
in place of the result as the data I'm after are not in page source.
However, I've come across several solutions in here which are able to fetch data from javascript encrypted sites even when the data are not in page source, so I suppose there should be any way to grab the two fields from the webpage using requests.
import requests
from bs4 import BeautifulSoup
link = 'https://www.sainsburys.co.uk/gol-ui/Product/persil-small---mighty-non-bio-laundry-liquid-21l-60-washes'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
res = s.get(link)
soup = BeautifulSoup(res.text,"lxml")
product_title = soup.select_one("h1[data-test-id='pd-product-title']").get_text(strip=True)
item_code = soup.select_one("span#productSKU").get_text(strip=True)
print(product_title,item_code)
Expected output:
Persil Non-Bio Laundry Liquid 1.43L
Item code: 7637944
How can I fetch the two fields from that site using requests?
Actually the wesite calling apis, so you can use that directly to get the data
r = requests.get('https://www.sainsburys.co.uk/groceries-api/gol-services/product/v1/product?filter[product_seo_url]=gb%2Fgroceries%2Fpersil-small---mighty-non-bio-laundry-liquid-21l-60-washes&include[ASSOCIATIONS]=true&include[PRODUCT_AD]=citrus')
products = r.json()['products']
for each_product in products:
print(f"Item code: {each_product['product_uid']}")
print(each_product['name'])
# Item code: 7637944
# Persil Non-Bio Laundry Liquid 1.43L
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.