简体   繁体   English

Python urllib2 响应头

[英]Python urllib2 Response header

I'm trying to extract the response header of a URL request.我正在尝试提取 URL 请求的响应标头。 When I use firebug to analyze the response output of a URL request, it returns:当我使用 firebug 分析 URL 请求的响应输出时,它返回:

Content-Type text/html

However when I use the python code:但是,当我使用 python 代码时:

urllib2.urlopen(URL).info()

the resulting output returns:结果输出返回:

Content-Type: video/x-flv

I am new to python, and to web programming in general;我是 python 新手,一般是 web 编程; any helpful insight is much appreciated.非常感谢任何有用的见解。 Also, if more info is needed please let me know.另外,如果需要更多信息,请告诉我。

Thanks in advance for reading this post提前感谢您阅读这篇文章

Try to request as Firefox does.尝试像 Firefox 那样请求。 You can see the request headers in Firebug, so add them to your request object:您可以在 Firebug 中看到请求标头,因此将它们添加到您的请求对象中:

import urllib2

request = urllib2.Request('http://your.tld/...')
request.add_header('User-Agent', 'some fake agent string')
request.add_header('Referer', 'fake referrer')
...
response = urllib2.urlopen(request)
# check content type:
print response.info().getheader('Content-Type')

There's also HTTPCookieProcessor which can make it better, but I don't think you'll need it in most cases.还有 HTTPCookieProcessor 可以让它变得更好,但我认为在大多数情况下你不需要它。 Have a look at python's documentation:看看python的文档:

http://docs.python.org/library/urllib2.html http://docs.python.org/library/urllib2.html

Content-Type text/html内容类型 text/html

Really, like that, without the colon?真的,像那样,没有冒号吗?

If so, that might explain it: it's an invalid header, so it gets ignored, so urllib guesses the content-type instead, by looking at the filename.如果是这样,那可能会解释它:它是一个无效的标头,所以它被忽略了,所以 urllib 通过查看文件名来猜测内容类型。 If the URL happens to have '.flv' at the end, it'll guess the type should be video/x-flv .如果 URL 的末尾恰好有 '.flv',它会猜测类型应该是video/x-flv

This peculiar discrepancy might be explained by different headers (maybe ones of the accept kind) being sent by the two requests -- can you check that...?这种特殊的差异可能是由两个请求发送的不同标头(可能是接受类型的标头)来解释的——你能检查一下……吗? Or, if Javascript is running in Firefox (which I assume you're using when you're running firebug?) -- since it's definitely NOT running in the Python case -- "all bets are off", as they say;-).或者,如果 Javascript 正在 Firefox 中运行(我假设您在运行 firebug 时正在使用它?)——因为它绝对没有在 Python 案例中运行——“所有的赌注都没有”,正如他们所说的那样;-) .

Keep in mind that a web server can return different results for the same URL based on differences in the request.请记住,Web 服务器可以根据请求的差异为同一 URL 返回不同的结果。 For example, content-type negotiation: the requestor can specify a list of content-types it will accept, and the server can return different results to try to accomodate different needs.例如,内容类型协商:请求者可以指定它将接受的内容类型列表,服务器可以返回不同的结果以尝试适应不同的需求。

Also, you may be getting an error page for one of your requests, for example, because it is malformed, or you don't have cookies set that authenticate you properly, etc. Look at the response itself to see what you are getting.此外,您可能会收到一个请求的错误页面,例如,因为它格式错误,或者您没有设置正确验证您的 cookie,等等。查看响应本身以了解您得到的内容。

according to http://docs.python.org/library/urllib2.html there is only get_header() method and nothing about getheader .根据http://docs.python.org/library/urllib2.html只有get_header()方法,没有关于getheader

Asking because Your code works fine for询问是因为您的代码适用于

response.info().getheader('Set cookie')

but once i execute但是一旦我执行

response.info().get_header('Set cookie')

i get:我得到:

Traceback (most recent call last):
  File "baza.py", line 11, in <module>
    cookie = response.info().get_header('Set-Cookie')
AttributeError: HTTPMessage instance has no attribute 'get_header'

edit: Moreover编辑:此外
response.headers.get('Set-Cookie') works fine as well, not mentioned in urlib2 doc.... response.headers.get('Set-Cookie')也可以正常工作,在 urlib2 文档中没有提到......

for getting raw data for the headers in python2, a little bit of a hack but it works.为了在 python2 中获取标头的原始数据,有点 hack,但它可以工作。

"".join(urllib2.urlopen("http://google.com/").info().__dict__["headers"])

basically "".join(list) will the list of headers, which all include "\n" at the end.基本上"".join(list)将是标题列表,所有标题都包含在末尾的 "\n" 。

__dict__ is a built in python variable for all dicts, basically you can select a list out of a 2d array with it.

and ofcourse ["headers"] is selecting the list value from the .info() response value dict当然["headers"]正在从.info()响应值字典中选择列表值

hope this helped you learn a few ez python tricks :)希望这可以帮助您学习一些 ez python 技巧 :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM