Python线程模块与参数循环？

Question

I am trying to create a crawler that crawl first 100 pages on a website: 我正在尝试创建一个对网站上的前100个页面进行爬网的爬网程序：

My code is something like this: 我的代码是这样的：

def extractproducts(pagenumber):
    contenturl = "http://websiteurl/page/" + str(pagenumber)

    content = BeautifulSoup(urllib2.urlopen(contenturl).read())
    print pagehtml



pagenumberlist = range(1, 101)

for pagenumber in pagenumberlist:
    extractproducts(pagenumber)

How do i go about using threading module in this situation so that urllib will crawl X number of URLs at a time using mutli threads? 在这种情况下，如何使用线程模块，以便urllib使用mutli线程一次可以抓取X个URL？

/newb out / newb出来

Answer 1

Most likely, you want to use multiprocessing . 您最有可能要使用multiprocessing 。 There's a Pool you can use to execute multiple things in parallel: 您可以使用Pool来并行执行多项操作：

from multiprocessing import Pool

# Note: This many threads may make your system unresponsive for a while
p = Pool(100)

# First argument is the function to call,
# second argument is a list of arguments
# (the function is called on each item in the list)
p.map(extractproducts, pagenumberlist)

If your function returns anything, Pool.map will return a list of return values: 如果您的函数返回任何内容，则Pool.map将返回一个返回值列表：

def f(x):
    return x + 1

results = Pool().map(f, [1, 4, 5])
print(results) # [2, 5, 6]

Python线程模块与参数循环？

问题描述

1 个解决方案

解决方案1
0 已采纳 2012-06-14 16:51:58

Python线程模块与参数循环？

问题描述

1 个解决方案

解决方案1 0 已采纳 2012-06-14 16:51:58

解决方案1
0 已采纳 2012-06-14 16:51:58