简体   繁体   中英

Build an XHR link on javascript website for python requests

I'm scraping the following website Scorebing using requests. In order to do so, I'm exploring the website to locate the XHR calls and get an url like this

页面位置 being the code as follows

import requests,json

header={some data from the XHR I got using Postman}
url='https://lv.scorebing.com/ajax/score/data?mt=0&nr=1&corner=1'

response=requests.get(url=url,headers=header,data=json.dumps({}))
response.json()

No problems there. My problem is that if I switch tab, like from Corner to Fixture, no new XHR is called. In fact, only "Live Matches" and "Corners" allows for this direct XHR connection. I see that some js scripts are loaded, but I can't go from there to replicating my previous step.

新页面位置

I know I can scrape this using selenium, and probably using a regular requests to the url of the page and using BSoup, but what I don't understand is why some tabs make XHR calls to load data where other similar ones use js. I would like to know how can you reverse engineer those js calls in order to get an API similar to the first part.

Firstly,you should know that XHR (XMLHttpRequest) in Chrome will record all the ajax request.


What's Ajax ?

Ajax is a set of web development techniques using many web technologies on the client side to create asynchronous web applications.

Ajax could be achieved by JavaScript or jQuery (Well,jQuery is a JavaScript library.It is JavaScipt essentially,but jQuery offer a API about ajax ).

In your example page,there are many ajax requests in the source code: 在此处输入图像描述

在此处输入图像描述


I would like to know how can you reverse engineer those js calls in order to get an API similar to the first part.

If you really want to do it just by the source code,you should:

  1. Send a GET request to the page.
  2. Analysis the source code of the page,then iterate each Javascript .(Also send GET request.)
  3. Find all the ajax requests and also send GET requests,select the data you need from them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM