[英]Strange behaviour of threading python
我有一些代码我想在python中使用线程进行并行化。 功能是:
def sanity(url):
global count
count+=1
if count%1000==0:
print(count)
try:
if 'media' in url[:10]:
url = "http://dummy.s3.amazonaws.com" + url
req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"})
ret = urllib.request.urlopen(req)
all_urls.append(url)
return 1
except (urllib.request.HTTPError,urllib.request.URLError,http.client.HTTPException, ValueError) as e:
print(e, url)
allurls.append(url)
errors.append(url)
return 0
我有一个网址列表,我必须为每个网址运行上述功能。 所以,我使用了线程。 代码如下:
start=0
arr=[0,1000,2000,...15000]
for i in arr:
threads = [threading.Thread(target=sanity, args=(url, errors,allurls,)) for url in urls[start:i]]
[thread.start() for thread in threads]
[thread.join() for thread in threads]
if i==0:
start=0
else:
start=i+1
上面的代码在python中使用线程在所有url上并行运行函数。 但是,返回的结果每次都会变化,与序列版本的结果不一致。 可能是什么问题呢?
任何帮助表示赞赏!
我会将对并行化的使用限制为对urllib.request.urlopen
的I / O绑定调用。 一个好处是不必处理全局变量或线程局部对象。
以下示例使用concurrent.futures 。 它被编写为独立模块,可以轻松接受argparse
输入,例如URL列表。 您还可以将ThreadPoolExecutor
封装在函数中。
from concurrent.futures import ThreadPoolExecutor, as_completed
import urllib
def sanity(url):
"""Attempt an HTML request"""
if 'media' in url[:10]:
url = "http://dummy.s3.amazonaws.com" + url
req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"})
ret = urllib.request.urlopen(req)
return ret
URLS = ('collection', 'of', 'strings')
allurls = []
errors = []
# set `max_workers` to your preferred upper limit of threads
with ThreadPoolExecutor(max_workers=2) as executor:
pool = {executor.submit(sanity, url): url for url in URLS}
# perform error handling as each future completes
for future in as_completed(pool):
res = future.result()
if isinstance(res, (urllib.request.HTTPError, urllib.request.URLError,
http.client.HTTPException, ValueError)):
print(res, pool[res])
# append the URL to lists as appropriate
allurls.append(pool[res])
errors.append(pool[res])
else:
allurls.append(pool[res])
# do something with `allurls` and `errors`
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.