简体   繁体   中英

Scrapy - How does a request sent using requests library to an API differs from the request that is sent using Scrapy.Request?

I am a beginner at using Scrapy and I was trying to scrape this website https://directory.ntschools.net/#/schools which is using javascript to load the contents. So I checked the.networks tab and there's an API address available https://directory.ntschools.net/api/System/GetAllSchools If you open this address, the data is in XML format. But when you check the response tab while inspecting the.network tab, the data is there in json format.

I first tried using Scrapy, sent the request to the API address WITHOUT any headers and the response that it returned was in XML which was throwing JSONDecode error upon using json.loads(). So I used the header 'Accept': 'application/json' and the response I got was in JSON. That worked well

import scrapy
import json
import requests

class NtseSpider_new(scrapy.Spider):
    name = 'ntse_new'
    header = {
        'Accept': 'application/json',
         'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.56',
    }
    
    def start_requests(self):
        yield scrapy.Request('https://directory.ntschools.net/api/System/GetAllSchools',callback=self.parse,headers=self.header)



    def parse(self,response):
        data = json.loads(response.body) #returned json response

But then I used the requests module WITHOUT any headers and the response I got was in JSON too!

import requests

import json


res = requests.get('https://directory.ntschools.net/api/System/GetAllSchools')

js = json.loads(res.content) #returned json response

Can anyone please tell me if there's any difference between both the types of requests? Is there a default response format for requests module when making a request to an API? Surely, I am missing something? Thanks

It's because Scrapy sets the Accept header to 'text/html,application/xhtml+xml,application/xml...'. You can see that from this .

I experimented and found that server sends a JSON response if the request has no Accept header.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM