[英]Python threading module in loops with parameters?
I am trying to create a crawler that crawl first 100 pages on a website: 我正在尝试创建一个对网站上的前100个页面进行爬网的爬网程序:
My code is something like this: 我的代码是这样的:
def extractproducts(pagenumber):
contenturl = "http://websiteurl/page/" + str(pagenumber)
content = BeautifulSoup(urllib2.urlopen(contenturl).read())
print pagehtml
pagenumberlist = range(1, 101)
for pagenumber in pagenumberlist:
extractproducts(pagenumber)
How do i go about using threading module in this situation so that urllib will crawl X number of URLs at a time using mutli threads? 在这种情况下,如何使用线程模块,以便urllib使用mutli线程一次可以抓取X个URL?
/newb out / newb出来
Most likely, you want to use multiprocessing . 您最有可能要使用multiprocessing 。 There's a
Pool
you can use to execute multiple things in parallel: 您可以使用
Pool
来并行执行多项操作:
from multiprocessing import Pool
# Note: This many threads may make your system unresponsive for a while
p = Pool(100)
# First argument is the function to call,
# second argument is a list of arguments
# (the function is called on each item in the list)
p.map(extractproducts, pagenumberlist)
If your function returns anything, Pool.map
will return a list of return values: 如果您的函数返回任何内容,则
Pool.map
将返回一个返回值列表:
def f(x):
return x + 1
results = Pool().map(f, [1, 4, 5])
print(results) # [2, 5, 6]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.