簡體   English   中英

使用gevent和請求異步模塊的ImportError

[英]ImportError with gevent and requests async module

我正在寫一個簡單的腳本:

  1. 加載大量的URL
  2. 使用請求的異步模塊獲取每個URL的內容以發出並發HTTP請求
  3. 使用lxml解析頁面內容,以檢查鏈接是否在頁面中
  4. 如果頁面上存在該鏈接,請在ZODB數據庫中保存有關該頁面的一些信息

當我用4或5個URL測試腳本時效果很好,當腳本結束時我只有以下消息:

 Exception KeyError: KeyError(45989520,) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored

但是當我嘗試檢查大約24000個URL時,它會在列表末尾(當剩下大約400個URL要檢查時)失敗,並出現以下錯誤:

Traceback (most recent call last):
  File "check.py", line 95, in <module>
  File "/home/alex/code/.virtualenvs/linka/local/lib/python2.7/site-packages/requests/async.py", line 83, in map
  File "/home/alex/code/.virtualenvs/linka/local/lib/python2.7/site-packages/gevent-1.0b2-py2.7-linux-x86_64.egg/gevent/greenlet.py", line 405, in joinall
ImportError: No module named queue
Exception KeyError: KeyError(45989520,) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored

我嘗試使用pypi上提供的gevent版本,並從gevent存儲庫下載並安裝最新版本(1.0b2)。

我無法理解為什么會發生這種情況,以及為什么只有在檢查一堆網址時才會發生這種情況。 有什么建議?

這是整個腳本:

from requests import async, defaults
from lxml import html
from urlparse import urlsplit
from gevent import monkey
from BeautifulSoup import UnicodeDammit
from ZODB.FileStorage import FileStorage
from ZODB.DB import DB
import transaction
import persistent
import random

storage = FileStorage('Data.fs')
db = DB(storage)
connection = db.open()
root = connection.root()
monkey.patch_all()
defaults.defaults['base_headers']['User-Agent'] = "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20100101 Firefox/11.0"
defaults.defaults['max_retries'] = 10


def save_data(source, target, anchor):
    root[source] = persistent.mapping.PersistentMapping(dict(target=target, anchor=anchor))
    transaction.commit()


def decode_html(html_string):
    converted = UnicodeDammit(html_string, isHTML=True)
    if not converted.unicode:
        raise UnicodeDecodeError(
            "Failed to detect encoding, tried [%s]",
            ', '.join(converted.triedEncodings))
    # print converted.originalEncoding
    return converted.unicode


def find_link(html_doc, url):
    decoded = decode_html(html_doc)
    doc = html.document_fromstring(decoded.encode('utf-8'))
    for element, attribute, link, pos in doc.iterlinks():
        if attribute == "href" and link.startswith('http'):
            netloc = urlsplit(link).netloc
            if "example.org" in netloc:
                return (url, link, element.text_content().strip())
    else:
        return False


def check(response):
    if response.status_code == 200:
        html_doc = response.content
        result = find_link(html_doc, response.url)
        if result:
            source, target, anchor = result
            # print "Source: %s" % source
            # print "Target: %s" % target
            # print "Anchor: %s" % anchor
            # print
            save_data(source, target, anchor)
    global todo
    todo = todo -1
    print todo

def load_urls(fname):
    with open(fname) as fh:
        urls = set([url.strip() for url in fh.readlines()])
        urls = list(urls)
        random.shuffle(urls)
        return urls

if __name__ == "__main__":

    urls = load_urls('urls.txt')
    rs = []
    todo = len(urls)
    print "Ready to analyze %s pages" % len(urls)
    for url in urls:
        rs.append(async.get(url, hooks=dict(response=check), timeout=10.0))
    responses = async.map(rs, size=100)
    print "DONE."

我不確定你問題的根源是什么,但為什么你的monkey.patch_all()不在文件的頂部?

你能試試嗎?

from gevent import monkey; monkey.patch_all()

在主程序的頂部,看看它是否修復了什么?

我是一個很大的n00b,但無論如何,我可以嘗試......! 我猜您可以嘗試通過以下方式更改導入列表:

from requests import async, defaults
import requests
from lxml import html
from urlparse import urlsplit
from gevent import monkey
import gevent
from BeautifulSoup import UnicodeDammit
from ZODB.FileStorage import FileStorage
from ZODB.DB import DB
import transaction
import persistent
import random

試試這個並告訴我它是否有效..我猜這可以解決你的問題:)

美好的一天。 我認為它是開放的python bug,編號為Issue1596321 http://bugs.python.org/issue1596321

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM