简体   繁体   中英

Scraping UPDATED source code from XHR response of a website

A website url www.example.com/abc does not change when iterating over different pages. After inspecting using DEV TOOLS on CHROME browser XHR tab it is noticed that POST request is posted to url www.example.com/abc-data and based on which response source code of url www.example.com/abc changes.

However 90% of the data is being returned and can be scraped from XHR response, but 10% of the data is present in dynamic source code which is updated depending on XHR response.

I've tried all the possible available solutions on inte.net but not able to crack the solution for this problem.

Env: Mac OS X Ventura Python 3.7.3

Note: Using BeautifulSoup

Short code snippet

url1 = www.example.com/abc
url2 = www.example.com/abc-data
with requests.Session() as s:
r = s.get(url1) # Extract token from this URL
# SOME CODE HERE
r = s.post(url2, data=payload) # Use token from above for this URL and session
soup = BeautifulSoup(r.text, 'html.parser')

After POST request as above, HTML SOURCE CODE is updated and I am not able to get that using BeautifulSoup. What I am receiving is just JSON response.

Any help would be much appreciated!!!

As I understood, you're trying to get the dynamic content of a web page using BeautifulSoup. That is not possible to do. BeautifulSoup only scrapes static web content.

If you really want to get the Dynamic Content , I recommend using Selenium .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM