简体   繁体   English

Python +机械化异步任务

[英]Python + Mechanize Async Tasks

So I have this bit of python code that runs through a delicious page and scrapes some links off of it. 因此,我有这段Python代码可以通过一个漂亮的页面运行,并从中刮取一些链接。 The extract method contains some magic that pull out the required content. 提取方法包含一些魔术,可以提取所需的内容。 However, running the page fetches one after another is pretty slow - is there a way to do this async in python so i can launch several get requests and process pages in parallel? 但是,运行页面一次又一次地获取很慢-有没有办法在python中做到这一点异步,以便我可以启动多个get请求并并行处理页面?

url= "http://www.delicious.com/search?p=varun"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
extract(soup)

count=1
#Follows regexp match onto consecutive pages
while soup.find ('a', attrs={'class': 'pn next'}):
    print "yay"
    print count
    endOfPage = "false"
    try :
        page3 = br.follow_link(text_regex="Next")
        html3 = page3.read()
        soup3 = BeautifulSoup(html3)
        extract(soup3)
    except:
        print "End of Pages"
        endOfPage = "true"
    if valval == "true":
        break
    count = count +1

Beautiful Soup很慢,如果您想获得更好的性能,请改用lxml,或者如果您有很多CPU,也许可以尝试对队列使用多处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM