简体   繁体   English

如何在Python 3中检索带有User-Agent标头的文件?

[英]How can I retrieve files with User-Agent headers in Python 3?

I'm trying to write a (simple) piece of code to download files off the internet. 我正在尝试编写一段(简单的)代码以从Internet下载文件。 The problem is, some of these files are on websites that block the default python User-Agent headers. 问题是,其中一些文件位于阻止默认python User-Agent标头的网站上。 For example: 例如:

import urllib.request as html
html.urlretrieve('http://stackoverflow.com', 'index.html')

returns 回报

urllib.error.HTTPError: HTTP Error 403: Forbidden`

Normally, I would set the headers in the request, such as: 通常,我会在请求中设置标头,例如:

import urllib.request as html
request = html.Request('http://stackoverflow.com', headers={"User-Agent":"Firefox"})
response = html.urlopen(request)

however, as urlretrieve doesn't work with requests for some reason, this isn't an option. 但是,由于某种原因urlretrieve无法处理请求,因此这不是一种选择。

Are there any simple-ish solutions to this (that don't include importing a library such as requests)? 是否有任何简单的解决方案(不包括导入请求之类的库)? I've noticed that urlretrieve is part of the legacy interface posted over from Python 2, is there anything I should be using instead? 我注意到urlretrieve是从Python 2发布的旧版接口的一部分,是否应该代替我使用?

I tried creating a custom FancyURLopener class to handle retrieving files, but that caused more problems than it solved, such as creating empty files for links that 404. 我尝试创建一个自定义的FancyURLopener类来处理检索文件,但这引起的问题比解决的问题多,例如为404链接创建空文件。

You can subclass URLopener and set the version class variable to a different user-agent then continue using urlretrieve. 您可以将URLopener子类URLopener ,并将version类变量设置为其他用户代理,然后继续使用urlretrieve。

Or you can simply use your second method and save the response to a file only after checking that code == 200 . 或者,您可以仅使用第二种方法,仅在检查code == 200之后将响应保存到文件中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何设置标题(用户代理)、检索网页、捕获重定向和接受 cookie? - how can set headers (user-agent), retrieve a web page, capture redirects and accept cookies? 如何在python中制作准确的用户代理字符串? - How can I make an accurate user-agent string in python? Python 次请求 - 403 次禁止 - 尽管设置了“User-Agent”标头 - Python requests - 403 forbidden - despite setting `User-Agent` headers 如何在Python MozEmbed中设置User-Agent? - How to set User-Agent in Python MozEmbed? 如何在我的 python 代码中使我的“用户代理标头”始终保持最新? - How to keep my 'User-Agent headers' always up to date in my python codes? 用于 python 的 pdfkit 中的用户代理 - User-Agent in pdfkit for python Urlretrieve 和用户代理? - Python - Urlretrieve and User-Agent? - Python 如何使用 Selenium 和 Python 为用户代理设置自定义名称 - How to set a custom name for the user-agent using Selenium and Python 在 pyside6 WebEngineView 中如何设置用户代理 - how can I setting the user-agent when in pyside6 WebEngineView 如何在用 Python 编写的 Selenium 脚本中成功更改 User-Agent? - How to successfully change User-Agent in Selenium script written in Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM