简体   繁体   English

Python线程模块与参数循环?

[英]Python threading module in loops with parameters?

I am trying to create a crawler that crawl first 100 pages on a website: 我正在尝试创建一个对网站上的前100个页面进行爬网的爬网程序:

My code is something like this: 我的代码是这样的:

def extractproducts(pagenumber):
    contenturl = "http://websiteurl/page/" + str(pagenumber)

    content = BeautifulSoup(urllib2.urlopen(contenturl).read())
    print pagehtml



pagenumberlist = range(1, 101)

for pagenumber in pagenumberlist:
    extractproducts(pagenumber)

How do i go about using threading module in this situation so that urllib will crawl X number of URLs at a time using mutli threads? 在这种情况下,如何使用线程模块,以便urllib使用mutli线程一次可以抓取X个URL?

/newb out / newb出来

Most likely, you want to use multiprocessing . 您最有可能要使用multiprocessing There's a Pool you can use to execute multiple things in parallel: 您可以使用Pool来并行执行多项操作:

from multiprocessing import Pool

# Note: This many threads may make your system unresponsive for a while
p = Pool(100)

# First argument is the function to call,
# second argument is a list of arguments
# (the function is called on each item in the list)
p.map(extractproducts, pagenumberlist)

If your function returns anything, Pool.map will return a list of return values: 如果您的函数返回任何内容,则Pool.map将返回一个返回值列表:

def f(x):
    return x + 1

results = Pool().map(f, [1, 4, 5])
print(results) # [2, 5, 6]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM