简体   繁体   English

从网站的 XHR 响应中抓取更新的源代码

[英]Scraping UPDATED source code from XHR response of a website

A website url www.example.com/abc does not change when iterating over different pages.网站 url www.example.com/abc在遍历不同页面时不会改变。 After inspecting using DEV TOOLS on CHROME browser XHR tab it is noticed that POST request is posted to url www.example.com/abc-data and based on which response source code of url www.example.com/abc changes.在 CHROME 浏览器 XHR 选项卡上使用 DEV TOOLS 检查后,注意到 POST 请求被发布到 url www.example.com/abc-data并基于 url www.example.com/abc的响应源代码发生变化。

However 90% of the data is being returned and can be scraped from XHR response, but 10% of the data is present in dynamic source code which is updated depending on XHR response.然而,90% 的数据正在返回并且可以从 XHR 响应中抓取,但是 10% 的数据存在于动态源代码中,它根据 XHR 响应进行更新。

I've tried all the possible available solutions on inte.net but not able to crack the solution for this problem.我已经在 inte.net 上尝试了所有可能的可用解决方案,但无法破解此问题的解决方案。

Env: Mac OS X Ventura Python 3.7.3环境:Mac OS X Ventura Python 3.7.3

Note: Using BeautifulSoup注意:使用 BeautifulSoup

Short code snippet短代码片段

url1 = www.example.com/abc
url2 = www.example.com/abc-data
with requests.Session() as s:
r = s.get(url1) # Extract token from this URL
# SOME CODE HERE
r = s.post(url2, data=payload) # Use token from above for this URL and session
soup = BeautifulSoup(r.text, 'html.parser')

After POST request as above, HTML SOURCE CODE is updated and I am not able to get that using BeautifulSoup. What I am receiving is just JSON response.在如上所述的 POST 请求之后,HTML SOURCE CODE 被更新,我无法使用 BeautifulSoup 获得它。我收到的只是 JSON 响应。

Any help would be much appreciated!!!任何帮助将非常感激!!!

As I understood, you're trying to get the dynamic content of a web page using BeautifulSoup. That is not possible to do.据我了解,您正在尝试使用 BeautifulSoup 获取 web 页面的动态内容。这是不可能的。 BeautifulSoup only scrapes static web content. BeautifulSoup 仅抓取 static web 内容。

If you really want to get the Dynamic Content , I recommend using Selenium .如果你真的想获得动态内容,我建议使用Selenium

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM