简体   繁体   English

有没有其他方法可以从动态网站中提取数据,而不是使用 selenium?

[英]Is there any other way to extract data from dynamic website, rather than using selenium?

I am trying to extract the data from the website https://shop.nordstrom.com/ for all the products (like shirt, t-shirt and so on).我正在尝试从网站https://shop.nordstrom.com/中提取所有产品(如衬衫、T 恤等)的数据。 The page is dynamically loaded.页面是动态加载的。 I know I can use selenium with headless browser, but that is also a time consuming process and looking up on the elements, having strange ID and class names, that is also not so promising.我知道我可以将 selenium 与无头浏览器一起使用,但这也是一个耗时的过程并且查找元素,具有奇怪的 ID 和 class 名称,这也不太有希望。

So I thought of looking up on the Network tool, if I can find the path to the API, from where the data is being loaded (XHR Request).所以我想查找网络工具,如果我能找到 API 的路径,从那里加载数据(XHR 请求)。 But I could not find any thing helpful.但我找不到任何有用的东西。 So is there a way to get the data from the website?那么有没有办法从网站上获取数据呢?

If you don't want to use selenium then the alternative is to use a web parser like bs4 or use simply the request module.如果您不想使用selenium则替代方法是使用 web 解析器(如bs4或仅使用request模块。

You are on the right path in finding the call to the API .您在找到对API的调用方面是正确的。 XHR requests can be seen under the network tab but the multitude of resources that appears makes it intricate to understand the requests being made. XHR请求可以在network选项卡下看到,但出现的大量资源使得理解正在发出的请求变得复杂。 A simple way around this is to use the following method:解决此问题的一种简单方法是使用以下方法:

Instead of Network tab go to the console tab.而不是Network选项卡 go 到console选项卡。 There click on the settings icon, and then tick just the option Log XMLHTTPRequests .单击settings图标,然后仅勾选选项Log XMLHTTPRequests

Now refresh the page and scroll down to initiate dynamic calls.现在刷新页面并向下滚动以启动动态调用。 You will now be able to see the logs of all XHR in a more clear way.您现在将能够以更清晰的方式查看所有XHR的日志。

For example例如

(index):29 Fetch finished loading: GET "** https://shop.nordstrom.com/api/recs?page_type=home&placement=HP_SALE%2CHP_TOP_RECS%2CHP_CUST_HIS%2CHP_AFF_BRAND%2CHP_FTR&channel=web&bound=24%2C24%2C24%2C24%2C6&apikey=9df15975b8cb98f775942f3b0d614157&session_id=0&shopper_id=df0fdb2bb2cf4965a344452cb42ce560&country_code=US&experiment_id=945b2363-c75d-4950-b255-194803a3ee2a&category_id=2375500&style_id=0%2C0%2C0%2C0&ts=1593768329863&url=https%3A%2F%2Fshop.nordstrom.com%2F&zip_code=null**" . (索引):29 获取完成加载:GET "** https://shop.nordstrom.com/api/recs?page_type=home&placement=HP_SALE%2CHP_TOP_RECS%2CHP_CUST_HIS%2CHP_AFF_BRAND%2CHP_FTR&channel=web&bound=24%2C24%2C %2C6&apikey=9df15975b8cb98f775942f3b0d614157&session_id=0&shopper_id=df0fdb2bb2cf4965a344452cb42ce560&country_code=US&experiment_id=945b2363-c75d-4950-b255-194803a3ee2a&category_id=2375500&style_id=0%2C0%2C0%2C0&ts=1593768329863&url=https%3A%2F%2Fshop.nordstrom.com%2F&zip_code=null** ”

Making a get request to that URL gives a bunch of Json objects.向该URL发出 get 请求会得到一堆Json对象。 You can now use this url and others that you can derive to make the request straight to the URL .您现在可以使用此url和其他您可以派生的直接向URL提出请求。

See the answer here on how you can integrate the url with a request module to fetch data.请参阅此处的答案,了解如何将url与请求模块集成以获取数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用硒从网站中提取所有动态表数据? - How to extract all dynamic table data from a website using selenium? 在python中使用硒从动态网站获取数据:如何发现数据库查询的完成方式? - Using selenium in python to get data from dynamic website: how to discover the way databases querys are done? 使用 Selenium & Beautiful Soup 从网站数据中获取动态表格 - Get Dynamic Tabular from Website data using Selenium & Beautiful Soup 如何使用硒从网站中提取数据? - How to extract data with selenium from a website? 如何使用 Selenium 从动态网站抓取数据 - How to scrape data from a dynamic website with Selenium 如何从动态网站中提取数据? - How to extract data from dynamic website? 除了为模块分配功能之外,有什么办法吗? - Is there any way rather than assigning functions to modules? 从网站中提取超链接 - Selenium - Extract a hyperlink from a website - Selenium 如何在Python中使用Selenium从具有隐藏元素的动态折叠表中提取数据 - How to extract data from dynamic collapsing table with hidden elements using Selenium in Python 如何使用selenium和Scrapy从Flipkart等动态网站中提取数据? - How to extract data from dynamic websites like Flipkart using selenium and Scrapy?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM