Python + Mechanize Async Tasks

Question

So I have this bit of python code that runs through a delicious page and scrapes some links off of it. The extract method contains some magic that pull out the required content. However, running the page fetches one after another is pretty slow - is there a way to do this async in python so i can launch several get requests and process pages in parallel?

url= "http://www.delicious.com/search?p=varun"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
extract(soup)

count=1
#Follows regexp match onto consecutive pages
while soup.find ('a', attrs={'class': 'pn next'}):
    print "yay"
    print count
    endOfPage = "false"
    try :
        page3 = br.follow_link(text_regex="Next")
        html3 = page3.read()
        soup3 = BeautifulSoup(html3)
        extract(soup3)
    except:
        print "End of Pages"
        endOfPage = "true"
    if valval == "true":
        break
    count = count +1

Answer 1

Beautiful Soup很慢，如果您想获得更好的性能，请改用lxml，或者如果您有很多CPU，也许可以尝试对队列使用多处理。

Python + Mechanize Async Tasks

Question

1 answers

solution1
1 ACCPTED 2010-12-19 05:30:55

Python + Mechanize Async Tasks

Question

1 answers

solution1 1 ACCPTED 2010-12-19 05:30:55

solution1
1 ACCPTED 2010-12-19 05:30:55