简体   繁体   中英

Scraping text from a website and the text I want does not appear in the source

I'm trying to grab the name and price of each of the products on the website https://store.com/shop .

When I manually view the website I can see the HTML code for each product but when I try to view it on beautiful soup using python I don't see it.

I think the problem is that the website displays the product on some sort of widget so it is not visible on the source code, but I am not sure.

my_url = 'https://store.com/shop'
headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15"}

##opens connection, grabbing page
source = requests.get(my_url, headers=headers)
html = source.content
soup =  BeautifulSoup(html, 'lxml')

print (soup.prettify())

The products are loaded dynamically via sending a GET request to:

https://roeblingliquors.com/api/v1/products/search.json?additional_properties%5Btype%5D%5B%5D=Spirits&new_style=true&merchant_id=5b19b7150fb4f72d6831344b&limit=20&skip=0&api_key=e0d3a091dc0d81547d6e168be2b3492a&sdk_guid=32560d92-28f6-e067-7286-3f505a73e61a&client_origin=app%3A%2F%2Fstorefront.5b19b7150fb4f72d6831344b

the data is in JSON format.

You can extract the products with just requests , there's no need to use BeautifulSoup .

The following gets the data from all the pages!

import requests

url = "https://roeblingliquors.com/api/v1/products/search.json?additional_properties%5Btype%5D%5B%5D=Spirits&new_style=true&merchant_id=5b19b7150fb4f72d6831344b&limit=20&skip=0&api_key=e0d3a091dc0d81547d6e168be2b3492a&sdk_guid=32560d92-28f6-e067-7286-3f505a73e61a&client_origin=app%3A%2F%2Fstorefront.5b19b7150fb4f72d6831344b"

response = requests.get(url).json()

# This will left align the text by amount specified
fmt_string = "{:<70} {:<15} {:<10}"
print(fmt_string.format("Name", "Measure", "Price"))
print("-" * 100)

for data in response["data"]["products"]:
    for product in data["merchants"][0]["product_options"]:
        measure = (
            product["option_params"]["size"]["measure"]
            + " "
            + product["option_params"]["size"]["quantity"]
        )
        price = product["price"]

        print(fmt_string.format(data["name"], measure, price))

Output (truncated):

Name                                                                   Measure         Price     
----------------------------------------------------------------------------------------------------
Plantation Rum Extra Old 20th Ann                                      ml 750          68.2      
Elijah Craig Barrel Proof Bourbon A121                                 ml 750          109.99    
Elijah Craig Small Batch Kentucky Straight Bourbon Whiskey 94 Proof    ml 750          37.38     
Tito's Vodka                                                           ml 50           2.53      

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM