简体   繁体   English

如何在python中使用Web URL下载文件? 通过浏览器下载有效,但不能通过 python 的请求下载

[英]How to download a file using web URL in python? Download through browser works but not through python's requests

The file gets downloaded if the URL is entered in a browser (Firefox, Chrome, etc.).如果在浏览器(Firefox、Chrome 等)中输入 URL,则会下载该文件。 But when I tried to download the same file (using the same URL) with python's requests or urllib library, I don't get any response.但是当我尝试使用 python 的请求urllib库下载相同的文件(使用相同的 URL)时,我没有得到任何响应。

URL:https://www.nseindia.com/products/content/sec_bhavdata_full.csv (Reference Page: https://www.nseindia.com/products/content/equities/equities/eq_security.htm )网址:https ://www.nseindia.com/products/content/sec_bhavdata_full.csv (参考页面: https : //www.nseindia.com/products/content/equities/equities/eq_security.htm

What I tried:我试过的:

import requests
eqfile = requests.get('https://www.nseindia.com/products/content/sec_bhavdata_full.csv')

got no respnse.没有回应。 Then tried the following然后尝试了以下

temp = requests.get('https://www.nseindia.com/products/content/equities/equities/eq_security.htm')

again no response.再次没有回应。

What would be the optimal way to download a file from such a URL (web server)?从这样的 URL(网络服务器)下载文件的最佳方式是什么?

If I use header User-Agent similar to header used by real web browser then I can download it.如果我使用类似于真实网络浏览器使用的标头的标头User-Agent ,那么我可以下载它。

import requests

headers = {'User-Agent': 'Mozilla/5.0'}
url = 'https://www.nseindia.com/products/content/sec_bhavdata_full.csv'

r = requests.get(url, headers=headers)
#print(r.content)

with open('sec_bhavdata_full.csv', 'wb') as fh:
    fh.write(r.content)

Portals often check this header to block requests or format HTML specially for your browser/device.门户网站经常检查此标头以阻止请求或专门为您的浏览器/设备设置 HTML 格式。 But requests (and urllib.request ) send "python ..." in this header.但是requests (和urllib.request )在此标头中发送"python ..."

Many portals needs only 'User-Agent': 'Mozilla/5.0' to send content but other may need full header User-Agent or even other headers like Referrer , Accept , Accept-Encoding , Accept-Language .许多门户网站只需要'User-Agent': 'Mozilla/5.0'来发送内容,但其他门户网站可能需要完整标头User-Agent甚至其他标头,如ReferrerAcceptAccept-EncodingAccept-Language You can see headers used by your browser on page https://httpbin.org/get您可以在页面https://httpbin.org/get上查看浏览器使用的标头

from real browser来自真实浏览器

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM