简体   繁体   中英

xhr scraping for python, using scrapy but no data return

ok i have asked this question before here python scraping for javascript not working and specific data

and it seems like i can get the data from extracting xhr content, which in this case i can have other alternate ways of doing this scraping without using selenium.

import scrapy
import json

class PublicMutual(scrapy.Spider):
    name = 'publicmutual'

    headers = {'Accept': '*/*',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'Connection': 'keep-alive',
    'Cookie': '.ASPXANONYMOUS=u8UpT1xTjt54Tf80JCsS2GqJWf4sPIksbzi5JOaw8TsM7i64n54q8yESMrdk81uj2hjiaMMLSMJAl0LcevRrYNP0XoGlGcGMpgNnmpG6YSMM1jAK0; Analytics_VisitorId=42ce4acb-6501-4828-aa81-74ef126af235; Analytics=SessionId=fc660efe-9e82-4379-afd5-8f77f203ff10&TabId=106&ContentItemId=-1; dnn_IsMobile=False; language=en-US; ASP.NET_SessionId=da1cbmzdgrzitjwntnlu3ioq; __RequestVerificationToken=Ry8wSKybT77XgBmmxuOfGmM4a6_Wy-B1MNKrN5g2zfVB1c6GXlL68ZYWUwZKBvVjyheTWQ2',
    'Host': 'www.publicmutual.com.my',
    'Referer': 'https://www.publicmutual.com.my/Our-Products/UT-Fund-Prices',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-origin',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest'}


    def start_requests(self):
        yield scrapy.Request(url='https://www.publicmutual.com.my/Our-Products/UT-Fund-Prices',headers= self.headers,callback=self.parse)

    def parse(self, response):
        print(json.loads(response.body))

this is what i am using as base and i don't get any output at all after running this code. I am not sure what did i do wrong here. Please help

ok i think i nailed it. there's a form in the page that's sent automatically through xhr (...). so we just grab them inputs to forge a payload and that should do it

from bs4 import BeautifulSoup
import requests

url='https://www.publicmutual.com.my/Our-Products/UT-Fund-Prices'
data=requests.get(url).text
soup = BeautifulSoup(data, 'lxml')
inputs=soup.select('input[type="hidden"]')
payload={}
for input_ in inputs:
    if 'name' in input_.attrs and 'value' in input_.attrs:
        payload[input_['name']]=input_['value']

table=requests.post(url, data=payload).text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM