[英]ImportError with gevent and requests async module
我正在寫一個簡單的腳本:
當我用4或5個URL測試腳本時效果很好,當腳本結束時我只有以下消息:
Exception KeyError: KeyError(45989520,) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored
但是當我嘗試檢查大約24000個URL時,它會在列表末尾(當剩下大約400個URL要檢查時)失敗,並出現以下錯誤:
Traceback (most recent call last):
File "check.py", line 95, in <module>
File "/home/alex/code/.virtualenvs/linka/local/lib/python2.7/site-packages/requests/async.py", line 83, in map
File "/home/alex/code/.virtualenvs/linka/local/lib/python2.7/site-packages/gevent-1.0b2-py2.7-linux-x86_64.egg/gevent/greenlet.py", line 405, in joinall
ImportError: No module named queue
Exception KeyError: KeyError(45989520,) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored
我嘗試使用pypi上提供的gevent版本,並從gevent存儲庫下載並安裝最新版本(1.0b2)。
我無法理解為什么會發生這種情況,以及為什么只有在檢查一堆網址時才會發生這種情況。 有什么建議?
這是整個腳本:
from requests import async, defaults
from lxml import html
from urlparse import urlsplit
from gevent import monkey
from BeautifulSoup import UnicodeDammit
from ZODB.FileStorage import FileStorage
from ZODB.DB import DB
import transaction
import persistent
import random
storage = FileStorage('Data.fs')
db = DB(storage)
connection = db.open()
root = connection.root()
monkey.patch_all()
defaults.defaults['base_headers']['User-Agent'] = "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20100101 Firefox/11.0"
defaults.defaults['max_retries'] = 10
def save_data(source, target, anchor):
root[source] = persistent.mapping.PersistentMapping(dict(target=target, anchor=anchor))
transaction.commit()
def decode_html(html_string):
converted = UnicodeDammit(html_string, isHTML=True)
if not converted.unicode:
raise UnicodeDecodeError(
"Failed to detect encoding, tried [%s]",
', '.join(converted.triedEncodings))
# print converted.originalEncoding
return converted.unicode
def find_link(html_doc, url):
decoded = decode_html(html_doc)
doc = html.document_fromstring(decoded.encode('utf-8'))
for element, attribute, link, pos in doc.iterlinks():
if attribute == "href" and link.startswith('http'):
netloc = urlsplit(link).netloc
if "example.org" in netloc:
return (url, link, element.text_content().strip())
else:
return False
def check(response):
if response.status_code == 200:
html_doc = response.content
result = find_link(html_doc, response.url)
if result:
source, target, anchor = result
# print "Source: %s" % source
# print "Target: %s" % target
# print "Anchor: %s" % anchor
# print
save_data(source, target, anchor)
global todo
todo = todo -1
print todo
def load_urls(fname):
with open(fname) as fh:
urls = set([url.strip() for url in fh.readlines()])
urls = list(urls)
random.shuffle(urls)
return urls
if __name__ == "__main__":
urls = load_urls('urls.txt')
rs = []
todo = len(urls)
print "Ready to analyze %s pages" % len(urls)
for url in urls:
rs.append(async.get(url, hooks=dict(response=check), timeout=10.0))
responses = async.map(rs, size=100)
print "DONE."
我不確定你問題的根源是什么,但為什么你的monkey.patch_all()不在文件的頂部?
你能試試嗎?
from gevent import monkey; monkey.patch_all()
在主程序的頂部,看看它是否修復了什么?
我是一個很大的n00b,但無論如何,我可以嘗試......! 我猜您可以嘗試通過以下方式更改導入列表:
from requests import async, defaults
import requests
from lxml import html
from urlparse import urlsplit
from gevent import monkey
import gevent
from BeautifulSoup import UnicodeDammit
from ZODB.FileStorage import FileStorage
from ZODB.DB import DB
import transaction
import persistent
import random
試試這個並告訴我它是否有效..我猜這可以解決你的問題:)
美好的一天。 我認為它是開放的python bug,編號為Issue1596321 http://bugs.python.org/issue1596321
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.