[英]Handling rss redirects with Python/urllib2
Calling urrlib2.urlopen
on a link to an article fetched from an RSS feed leads to the following error: 在指向从RSS源提取的文章的链接上调用
urrlib2.urlopen
会导致以下错误:
urllib2.HTTPError: HTTP Error 301: The HTTP server returned a redirect error tha t would lead to an infinite loop.
urllib2.HTTPError:HTTP错误301:HTTP服务器返回重定向错误,导致无限循环。 The last 30x error message was: Moved Permanently
最后30x错误消息是:永久移动
According to the documentation, urllib2 supports redirects. 根据文档,urllib2支持重定向。
On Java the problem was solved by just calling 在Java上,问题通过调用解决了
HttpURLConnection.setFollowRedirects(true);
How can I solve it with Python? 我怎样才能用Python解决它?
UPDATE UPDATE
The link I'm having problems with: 我遇到问题的链接:
http://feeds.nytimes.com/click.phdo?i=8cd5af579b320b0bfd695ddcc344d96c http://feeds.nytimes.com/click.phdo?i=8cd5af579b320b0bfd695ddcc344d96c
Turns out you need to enable Cookies. 原来你需要启用Cookies。 The page redirects to itself after setting a cookie first.
首先设置cookie后,页面会重定向到自身。 Because urllib2 does not handle cookies by default you have to do it yourself.
因为默认情况下urllib2不处理cookie,所以你必须自己动手。
import urllib2
import urllib
from cookielib import CookieJar
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
p = opener.open("http://feeds.nytimes.com/click.phdo?i=8cd5af579b320b0bfd695ddcc344d96c")
print p.read()
Nothing wrong with @sleeplessnerd's solution, but this is very, very slightly more elegant: @ sleeplessnerd的解决方案没有任何问题,但这非常非常优雅:
import urllib2
url = "http://stackoverflow.com/questions/9926023/handling-rss-redirects-with-python-urllib2"
p = urllib2.build_opener(urllib2.HTTPCookieProcessor).open(url)
print p.read()
In fact, if you look at the inline documentation for the CookieJar()
function, it more-or-less tells you to do things this way: 事实上,如果你看一下
CookieJar()
函数的内联文档,它或多或少会告诉你这样做:
You may not need to know about this class: try urllib2.build_opener(HTTPCookieProcessor).open(url)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.