ok i have asked this question before here python scraping for javascript not working and specific data
and it seems like i can get the data from extracting xhr content, which in this case i can have other alternate ways of doing this scraping without using selenium.
import scrapy
import json
class PublicMutual(scrapy.Spider):
name = 'publicmutual'
headers = {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Connection': 'keep-alive',
'Cookie': '.ASPXANONYMOUS=u8UpT1xTjt54Tf80JCsS2GqJWf4sPIksbzi5JOaw8TsM7i64n54q8yESMrdk81uj2hjiaMMLSMJAl0LcevRrYNP0XoGlGcGMpgNnmpG6YSMM1jAK0; Analytics_VisitorId=42ce4acb-6501-4828-aa81-74ef126af235; Analytics=SessionId=fc660efe-9e82-4379-afd5-8f77f203ff10&TabId=106&ContentItemId=-1; dnn_IsMobile=False; language=en-US; ASP.NET_SessionId=da1cbmzdgrzitjwntnlu3ioq; __RequestVerificationToken=Ry8wSKybT77XgBmmxuOfGmM4a6_Wy-B1MNKrN5g2zfVB1c6GXlL68ZYWUwZKBvVjyheTWQ2',
'Host': 'www.publicmutual.com.my',
'Referer': 'https://www.publicmutual.com.my/Our-Products/UT-Fund-Prices',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'}
def start_requests(self):
yield scrapy.Request(url='https://www.publicmutual.com.my/Our-Products/UT-Fund-Prices',headers= self.headers,callback=self.parse)
def parse(self, response):
print(json.loads(response.body))
this is what i am using as base and i don't get any output at all after running this code. I am not sure what did i do wrong here. Please help
ok i think i nailed it. there's a form in the page that's sent automatically through xhr (...). so we just grab them inputs to forge a payload and that should do it
from bs4 import BeautifulSoup
import requests
url='https://www.publicmutual.com.my/Our-Products/UT-Fund-Prices'
data=requests.get(url).text
soup = BeautifulSoup(data, 'lxml')
inputs=soup.select('input[type="hidden"]')
payload={}
for input_ in inputs:
if 'name' in input_.attrs and 'value' in input_.attrs:
payload[input_['name']]=input_['value']
table=requests.post(url, data=payload).text
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.