简体   繁体   English

urllib2和cookielib线程安全

[英]urllib2 and cookielib thread safety

As far as I've been able to tell cookielib isnt thread safe; 至于我已经能够告诉cookielib不是线程安全的; but then again the post stating so is five years old, so it might be wrong. 但话说说这样的帖子已经五年了,所以这可能是错的。

Nevertheless, I've been wondering - If I spawn a class like this: 不过,我一直想知道 - 如果我产生这样一个类:

class Acc:
    jar = cookielib.CookieJar()
    cookie = urllib2.HTTPCookieProcessor(jar)       
    opener = urllib2.build_opener(cookie)

    headers = {}
    def __init__ (self,login,password):
        self.user = login
        self.password = password

    def login(self):
        return False # Some magic, irrelevant

    def fetch(self,url):
        req = urllib2.Request(url,None,self.headers)
        res = self.opener.open(req)
        return res.read()

for each worker thread, would it work? 对于每个工作线程,它会起作用吗? (or is there a better approach?) Each thread would use it's own account; (或者有更好的方法吗?)每个线程都会使用它自己的帐户; so the fact that workers wouldn't share their cookies is not a problem. 所以工人不会分享他们的饼干的事实不是问题。

You could see implementation of the library [python_install_path]/lib/cookielib.py to ensure that cookielib.CookieJar is thread safe . 您可以看到库[python_install_path]/lib/cookielib.py实现,以确保cookielib.CookieJar 是线程安全的

It means if you will share one instance of CookieJar between several connections in different threads, you will not face even inconsistence read of Cookie Set, because CookieJar uses lock self._cookies_lock inside. 这意味着如果您将在不同线程中的多个连接之间共享一个CookieJar实例,您将不会面对Cookie Set的不一致读取,因为CookieJar使用了lock self._cookies_lock

You want to use pycurl (the python interface to libcurl ). 你想使用pycurllibcurl的python接口)。 It's thread-safe, supports cookies, https, etc.. The interface is a bit strange, but it just takes a bit of getting used to. 它是线程安全的,支持cookie,https等。界面有点奇怪,但它只需要一点习惯。

I've only used pycurl w/ HTTPBasicAuth + SSL, but I did find an example using pycurl and cookies here . 我只用pycurl W / HTTPBasicAuth + SSL,但我没有找到使用pycurl和饼干的例子在这里 I believe you'll need to update the pycurl.COOKIEFILE (line 74) and pycurl.COOKIEJAR (line 82) to have some unique name (maybe keying off of id(self.crl) ). 我相信你需要更新pycurl.COOKIEFILE(第74行)和pycurl.COOKIEJAR(第82行)以获得一些唯一的名称(可能是id(self.crl) )。

As I remember, you'll need to create a new pycurl.Curl() for each request to maintain thread safety. 我记得,你需要为每个请求创建一个新的pycurl.Curl()以保持线程安全。

the same question as you. 和你一样的问题。 If you do not use pycurl, I think you must urllib2.install_opener(self.opener) before each urllib2.urlopen. 如果你不使用pycurl,我认为你必须在每个urllib2.urlopen之前使用urllib2.install_opener(self.opener)。

Maybe I should use the pycurl too, urllib2 is not so smart. 也许我也应该使用pycurl,urllib2不是那么聪明。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM