简体   繁体   中英

Scrapy Post Request with payload

I am trying to scrape a website where I can find the url has a post request with payload parameters as below. I am not sure how to make it to a dictionary in payload and send it in formdata. All the below code comes under Request with payload how do I send this in formdata??

callCount=1
page=/job.do?uno=auo&internal=1
httpSessionId=724D68422B64D786F28E57EB3EE9D07D.newweb04
scriptSessionId=D0E5AF56ACF10360C6B960329CBB012B883
c0-scriptName=normalAjaxService
c0-methodName=getJobDetailByCustId
c0-id=0
c0-param0=number:1000000157
c0-param1=number:551740849947131
batchId=2

This is how I am trying to send the formdata, where I am receiving no response

            formdata_2 = {
            'callCount': '1',
            'page': '/job.do?uno=auo&internal=1',
            'httpSessionId': '724D68422B64D786F28E57EB3EE9D07D.newweb04',
            'scriptSessionId': session_id,
            'c0-scriptName': 'normalAjaxService',
            'c0-methodName': 'searchJobs',
            'c0-id': '0',
            'c0-param0=number': '1000000157',
            'c0-param1=string': '',
            'c0-param2=string': '0',
            'c0-param3=string': '0',
            'c0-param4=number': '0',
            'c0-param5=string': 'i88ky1c0',
            'batchId': '0',
        }

I'm not entirely sure that I understood your question correctly, but if that is the text that you found somewhere on the scraped site, and want to format it into a valid Requests to crawl a new site, you can turn it into a dict as follows:

# assume that you got that text inside a variable
scraped = """
callCount=1
page=/job.do?uno=auo&internal=1
httpSessionId=724D68422B64D786F28E57EB3EE9D07D.newweb04
scriptSessionId=D0E5AF56ACF10360C6B960329CBB012B883
c0-scriptName=normalAjaxService
c0-methodName=getJobDetailByCustId
c0-id=0
c0-param0=number:1000000157
c0-param1=number:551740849947131
batchId=2
"""
param_list = [line.split('=', 1) for line in scraped.split('\n')]
formdata = {p[0]:p[1] for p in param_list}

Now you have your form data in a dictionary. If the page you are to visit is the page parameter from the form data, you can use that and create the apsolute URL with urlparse (assuming this is called from inside a callback function, where you have response available:

page = urlparse.urljoin(response.url, formdata.pop('page'))

Now you can follow the link with the appropriate form data:

return scrapy.Request(page, formdata=formdata)

I hope this answers your question, if not please explain further what you need.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM