简体   繁体   中英

BeautifulSoup is returning empty data from website

I am running this code

import requests
from bs4 import BeautifulSoup


headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'

}

r = requests.get('https://www.bohus.no/spiseplassen/oppbevaring-1/gradino-vitrine-2')

soup = BeautifulSoup(r.content, 'lxml')


print(soup.find('div', class_='price').text)

I am trying to get the price of the product on this site: https://www.bohus.no/spiseplassen/oppbevaring-1/gradino-vitrine-2 All I am getting is empty data when running my code. Am I doing something wrong or does the website do something special to stop me from scaping price?

As stated in the comments, you can get the product data from the store's API.

Here's how:

import requests
from bs4 import BeautifulSoup


headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/78.0.3904.108 Safari/537.36',
    "x-requested-with": "XMLHttpRequest"

}

product_url = "https://www.bohus.no/spiseplassen/oppbevaring-1/gradino-vitrine-2"
page_content = requests.get(product_url).content
soup = BeautifulSoup(page_content, 'lxml')
product_id = soup.find("input", {"name": "d-session-product"})["value"]

payload = {
    "debug": "off",
    "ajax": "1",
    "product_list": product_id,
    "action": "init",
    "showStockStatusAndShoppingcart": "1",
    "enablePickupAtNearbyStores": "yes",
}

endpoint = "https://www.bohus.no/lite.cgi/module/priceAndStock"
product_data = requests.post(endpoint, data=payload, headers=headers).json()
print(product_data["price"][0]["salesPriceNormal"])

Output:

8799

Also, you can use requests-HTML library:

import requests_html


from requests_html import HTMLSession
session = HTMLSession()


r = session.get('https://www.bohus.no/spiseplassen/oppbevaring-1/gradino-vitrine-2')
r.html.render()
sel='div.price-data > div.price'
print(r.html.find(sel, first=True).text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM