简体   繁体   English

python请求无法下载zip文件,而浏览器/硒可以

[英]python requests cannot download a zip file while browser/selenium can

I tried to use requests module to download a bunch of zip files with the code below: 我尝试使用请求模块通过以下代码下载一堆zip文件:

s = requests.Session()
url='http://data.theice.com/MyAccount/Login.aspx'
z=s.get(url)
soup=BeautifulSoup(z.content,'html.parser')
hidden=soup.find_all('input',attrs={'type':'hidden'})
values={'ctl00$ContentPlaceHolder1$LoginControl$m_userName':'Acorn437',
    'ctl00$ContentPlaceHolder1$LoginControl$m_password':'*******',
    '__EVENTTARGET':'ctl00$ContentPlaceHolder1$LoginControl$LoginButton',
    '__EVENTARGUMENT':'',
    '__LASTFOCUS':''}
values=dict(values,**{i['id']:i['value'] for i in hidden})
z=s.post(url,data=values,allow_redirects=True)

After here, I verified that I have successfully loginned into the website by checking the response. 在此之后,我通过检查响应来验证我已成功登录该网站。 Now I would like to download the zip file from a link on the website 现在,我想从网站上的链接下载zip文件

link='http://data.theice.com/MyAccount/Download.aspx?PUID=69590&PDS=0&PRODID=580&TS=2018'
resp=s.get(link,allow_redirects=True)   
path=os.getcwd()+'\\data\\ice_zip\\'
fname='test.zip'
zfile=open(path+fname,'wb')
zfile.write(resp.content)
zfile.close()

However, it turned out that what I downloaded is acutally a html file intead of the zip file I need. 但是,事实证明,我下载的实际上是我需要的zip文件的html文件。 I have no idea why the requests module does not work for this website. 我不知道为什么请求模块不适用于该网站。 I think after I login in with requests.session, I should be able to download it because I can do it with a browser or the selenium module. 我认为在使用request.session登录后,我应该可以下载它,因为我可以使用浏览器或selenium模块来进行操作。

Clearly, I have no problem logining into the 显然,我没有问题登录

This works for me - given of course you provide your own credentials and download path... I think your main problem might be that your login URL was wrong. 这对我有用-当然,您可以提供自己的凭据和下载路径...我认为您的主要问题可能是您的登录URL错误。 When I ran your code I could NOT login to the site. 当我运行您的代码时,我无法登录该站点。 The intial URL and the login URL are different ones. 初始URL和登录URL是不同的。

import requests
from bs4 import BeautifulSoup

# define variables
username = ""
password = ""
path_to_store_output = ""

session = requests.Session()
r = session.get('http://data.theice.com/MyAccount/Login.aspx'')
soup=BeautifulSoup(r.text,'html.parser')

vs_generator = soup.find('input', attrs={'id': '__VIEWSTATEGENERATOR'}).get('value')
vs = soup.find('input', attrs={'id': '__VIEWSTATE'}).get('value')
event_validation = soup.find('input', attrs={'id': '__EVENTVALIDATION'}).get('value')


payload = {
    "__EVENTTARGET": "ctl00$ContentPlaceHolder1$LoginControl$LoginButton",
    "__EVENTARGUMENT":"", 
    "__LASTFOCUS": "", 
    "__VIEWSTATE": vs,
    "__VIEWSTATEGENERATOR": vs_generator,
    "__EVENTVALIDATION": event_validation,
    "ctl00$ContentPlaceHolder1$LoginControl$m_userName": username,
    "ctl00$ContentPlaceHolder1$LoginControl$m_password": password  
}
# doing a POST to login
r = session.post("http://www.ice.if5.com/MyAccount/Login.aspx", data=payload, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'})

# check if we're logged in
if not username in r.text:
    print("[!] Bommer, dude! We're not logged in...")

else:
    print("[*] Score, we're in. Let's download stuff...")
    r = session.get("http://www.ice.if5.com/MyAccount/Download.aspx?PUID=70116&PDS=2&PRODID=4133", headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'})
    with open(path_to_store_output, 'wb') as f:
        f.write(r.content)

There's actually not much to this. 实际上没有太多。 Login and grab the stuff. 登录并获取内容。 Replace the url, I tested with whatever you're interested in. The one you provided gave me a 404. Cheers. 替换该URL,我用您感兴趣的任何东西进行了测试。您提供的URL给了我404。干杯。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 python 请求下载这个 zip 文件? - how to download this zip file using python requests? 使用python请求屏蔽作为浏览器并下载文件 - Using python requests to mask as a browser and download a file 请求:无法正确下载zip文件,且验证为False - requests: cannot download zip file properly with verify is False Python 请求根据请求下载 zip 文件。 - Python requests download zip file on request.post 使用 python 中的请求模块从 URL 下载 zip 文件 - Download a zip file from a URL using requests module in python Python - 使用请求 package 下载 zip 文件,但得到未知文件格式 - Python - Download zip files with requests package but get unknown file format Cannot download PDF file in to the specified directory using selenium with python in Firefox browser, pdf file opens in browser window itself - Cannot download PDF file in to the specified directory using selenium with python in Firefox browser, pdf file opens in browser window itself 使用带有 Python 的请求下载 zip 扩展时获取损坏的 zip 文件 - Getting a corupt zip file when download a zip extension using requests with Python Python 请求无法下载文件 - Python Requests can't download file 使用 Selenium 下载文件时出现“失败 - 下载错误” Python - "Failed - Download error" while download a file using Selenium Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM