繁体   English   中英

从网站抓取文本,我想要的文本没有出现在源代码中

[英]Scraping text from a website and the text I want does not appear in the source

我正在尝试获取网站https://store.com/shop上每种产品的名称和价格。

当我手动查看网站时,我可以看到每个产品的 HTML 代码,但是当我尝试使用 python 在漂亮的汤上查看它时,我看不到它。

我认为问题在于网站在某种小部件上显示产品,因此它在源代码中不可见,但我不确定。

my_url = 'https://store.com/shop'
headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15"}

##opens connection, grabbing page
source = requests.get(my_url, headers=headers)
html = source.content
soup =  BeautifulSoup(html, 'lxml')

print (soup.prettify())

通过向以下地址发送GET请求来动态加载产品:

https://roeblingliquors.com/api/v1/products/search.json?additional_properties%5Btype%5D%5B%5D=Spirits&new_style=true&merchant_id=5b19b7150fb4f72d6831344b&limit=20&skip=0&api_key=e0d3a091dc0d81547d6e168be2b3492a&sdk_guid=32560d92-28f6-e067-7286-3f505a73e61a&client_origin=app%3A%2F%2Fstorefront.5b19b7150fb4f72d6831344b

数据采用 JSON 格式。

您只需使用requests即可提取产品,无需使用BeautifulSoup

下面从所有页面获取数据!

import requests

url = "https://roeblingliquors.com/api/v1/products/search.json?additional_properties%5Btype%5D%5B%5D=Spirits&new_style=true&merchant_id=5b19b7150fb4f72d6831344b&limit=20&skip=0&api_key=e0d3a091dc0d81547d6e168be2b3492a&sdk_guid=32560d92-28f6-e067-7286-3f505a73e61a&client_origin=app%3A%2F%2Fstorefront.5b19b7150fb4f72d6831344b"

response = requests.get(url).json()

# This will left align the text by amount specified
fmt_string = "{:<70} {:<15} {:<10}"
print(fmt_string.format("Name", "Measure", "Price"))
print("-" * 100)

for data in response["data"]["products"]:
    for product in data["merchants"][0]["product_options"]:
        measure = (
            product["option_params"]["size"]["measure"]
            + " "
            + product["option_params"]["size"]["quantity"]
        )
        price = product["price"]

        print(fmt_string.format(data["name"], measure, price))

输出(截断):

Name                                                                   Measure         Price     
----------------------------------------------------------------------------------------------------
Plantation Rum Extra Old 20th Ann                                      ml 750          68.2      
Elijah Craig Barrel Proof Bourbon A121                                 ml 750          109.99    
Elijah Craig Small Batch Kentucky Straight Bourbon Whiskey 94 Proof    ml 750          37.38     
Tito's Vodka                                                           ml 50           2.53      

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM