Python +机械化异步任务

Question

So I have this bit of python code that runs through a delicious page and scrapes some links off of it. 因此，我有这段Python代码可以通过一个漂亮的页面运行，并从中刮取一些链接。 The extract method contains some magic that pull out the required content. 提取方法包含一些魔术，可以提取所需的内容。 However, running the page fetches one after another is pretty slow - is there a way to do this async in python so i can launch several get requests and process pages in parallel? 但是，运行页面一次又一次地获取很慢-有没有办法在python中做到这一点异步，以便我可以启动多个get请求并并行处理页面？

url= "http://www.delicious.com/search?p=varun"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
extract(soup)

count=1
#Follows regexp match onto consecutive pages
while soup.find ('a', attrs={'class': 'pn next'}):
    print "yay"
    print count
    endOfPage = "false"
    try :
        page3 = br.follow_link(text_regex="Next")
        html3 = page3.read()
        soup3 = BeautifulSoup(html3)
        extract(soup3)
    except:
        print "End of Pages"
        endOfPage = "true"
    if valval == "true":
        break
    count = count +1

Answer 1

Beautiful Soup很慢，如果您想获得更好的性能，请改用lxml，或者如果您有很多CPU，也许可以尝试对队列使用多处理。

Python +机械化异步任务

问题描述

1 个解决方案

解决方案1
1 已采纳 2010-12-19 05:30:55

Python +机械化异步任务

问题描述

1 个解决方案

解决方案1 1 已采纳 2010-12-19 05:30:55

解决方案1
1 已采纳 2010-12-19 05:30:55