简体   繁体   中英

How to check POST requirement in Python

I'm trying to automate a report in Python by scraping data from a webpage. The site uses authentication that I need to pass.

I use the below code to log in and try to download the report page but it seems I'm doing something wrong. Authentication passes with HTTP status code 200 but right after the authentication the site says that "An error was encountered while serving the request. Please see the log for more detail." I guess it refers to server log but I'm not the owner of the server, so I can't check that.

I think I don't pass something in my POST request during login hence I'm getting this message.

Is there a tool that I can use to track GET/POST traffic and requirements? The website that I'm trying to crawl is rather old, written in .NET and it's not compatible with Chrome so I can't use Chrome's Developer Tools.

Here's my code:

import requests                                                                                                                                                                                                                                                                                            

USERNAME = 'myuser'                                                                                                                                                                                                                                                             
PASSWORD = 'mypw'                                                                                                                                                                                                                                                          
DOMAIN = 'domain comes here'                                                                                                                                                                                                                                                                                             

LOGINURL = 'https://reportsite.com/login'                                                            
DATAURL = 'https://reportsite.com/data'                                                                                                                                                                                                        

session = requests.session()                                                                                                                                                                                                                                                                               

req_headers = {                                                                                                                                                                                                                                                                                            
    'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',                                                                                                                                                                                                               
    'Accept-Encoding' : 'gzip, deflate',                                                                                                                                                                                                                                                                   
    'Accept-Language' : 'en-US,en;q=0.8',                                                                                                                                                                                                                                                                  
    'Cache-Control' : 'max-age=0',                                                                                                                                                                                                                                                                         
    'Connection' : 'keep-alive',                                                                                                                                                                                                                                                                           
    'Content-Length' : '573',                                                                                                                                                                                                                                                                              
    'Content-Type' : 'application/x-www-form-urlencoded',                                                                                                                                                                                                                                                  
    'Cookie' : 'ASP.NET_SessionId=u03xo1ypcphzfo523c0lc5ok',                                                                                                                                                                                                                                               
    'Host' : 'myhost.net',                                                                                                                                                                                                                                                                
    'Origin' : 'https://myhost.net',                                                                                                                                                                                                                                                      
    'Referer' : 'https://myhost.net/WAS/Login.aspx?ReturnUrl=%2fWAS%2fAWEMain.aspx%3flog%3dsaved%26xcapp%3dsplash%26xcsid%3dVISTA&log=saved&xcapp=splash&xcsid=VISTA',                                                                                                                    
    'User-agent': 'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)'                                                                                                                                                             
}                                                                                                                                                                                                                                                                                                          

formdata = {                                                                                                                                                                                                                                                                                               
    '__VIEWSTATE' : '/wEPDwUJNzM1NjMxNzAxD2QWAgIBD2QWAgIDDxBkDxYGZgIBAgICAwIEAgUWBhAFBEFQQUMFBEFQQUNnEAUDQVVTBQNBVVNnEAUDRVVSBQNFVVJnEAUDSlBOBQNKUE5nEAUDTEFDBQNMQUNnEAUDTkFNBQNOQU1nZGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFDEltYWdlQnV0dG9uMQ7nE6wwQ2IuIJZCRML2VTku00DrmD2fT7YsZ+JtwEKT',    
    '__VIEWSTATEGENERATOR' : '999CB518',                                                                                                                                                                                                                                                                   
    '__EVENTVALIDATION' :' /wEWCgLvhYTaCwLL/4HeAgLSwpnTCALSxeCRDwKmhfK5BQKoxMzXBAKJv+mgAQLYyZC+BwLdu76IAgK5oPGLAXlSoU7X+UsNQS7lILVvRCWX/xKRtPK1u2cI/XJCVBMI',                                                                                                                                              
    'Userid': USERNAME,                                                                                                                                                                                                                                                                                    
    'ImageButton1.x' :28,                                                                                                                                                                                                                                                                                  
    'ImageButton1.y' :7,                                                                                                                                                                                                                                                                                   
    'Password': PASSWORD,                                                                                                                                                                                                                                                                                  
    'Domain' : DOMAIN,                                                                                                                                                                                                                                                                                     
    'WANT_NEW_USER' : ''                                                                                                                                                                                                                                                                                   
}                                                                                                                                                                                                                                                                                                          

# Authenticate                                                                                                                                                                                                                                                                                             
r = session.post(LOGINURL, data=formdata, headers=req_headers, allow_redirects=False)                                                                                                                                                                                                                      
print "___________LOGIN____________"                                                                                                                                                                                                                                                                       
print r.headers                                                                                                                                                                                                                                                                                            
print r.status_code                                                                                                                                                                                                                                                                                        
print r.text                                                                                                                                                                                                                                                                                               

# Read data                                                                                                                                                                                                                                                                                                
r2 = session.get(DATAURL)                                                                                                                                                                                                                                                                                  
print "___________DATA____________"                                                                                                                                                                                                                                                                        
print r2.headers                                                                                                                                                                                                                                                                                           
print r2.status_code                                                                                                                                                                                                                                                                                       
print r2.text                                                                                                                                                                                                                                                                                              

I figured out what was the problem. Unfortunately my options are quite limited and I can't install anything on my company laptop. If I would have admin rights on the laptop I would definitely install a sniffer to see what's going on in the background.

I checked the source of my login page manually and went through on each filed one by one which were passed in the POST request. It turned out that there was a hidden field which was mandatory to pass and my script didn't send it. After I added the hidden field to my POST request everything went smooth.

I would suggest to anyone to try to put together the URL of the POST request manually (eg: www.site.com/login.aspx?userid=myid&csid=233 etc...) and check the response in a browser. That helped me to figure out where to start to investigate the issue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM