[英]Parallelising for-loop using multiprocessing pool function
I was trying to follow the example @ this location: 我正在尝试按照以下示例@此位置:
[ How to use threading in Python? [ 如何在Python中使用线程?
I have a sample dataframe (df) like this: 我有一个示例数据框(df),如下所示:
segment x_coord y_coord
a 1 1
a 2 4
a 1 7
b 2 3
b 4 3
b 8 3
c 4 4
c 2 5
c 7 8
and creating kd-tree using for loop for each of segments in loop as below: 并使用for循环为循环中的每个分段创建kd-tree,如下所示:
dist_name=df['segment'].unique()
for i in range(len(dist_name)):
a=df[df['segment']==dist_name[i]]
tree[i] = spatial.cKDTree(a[['x_coord','y_coord']])
How can i parallelize the tree creation using the sample sighted in link as below: 我如何使用链接中显示的示例并行化树的创建,如下所示:
results = []
for url in urls:
result = urllib2.urlopen(url)
results.append(result)
Parallelize to >> 平行于>>
pool = ThreadPool(4)
results = pool.map(urllib2.urlopen, urls)
My attempt 我的尝试
import pandas as pd
import time
from scipy import spatial
import random
from multiprocessing.dummy import Pool as ThreadPool
dist_name=['a','b','c','d','e','f','g','h']
df=pd.DataFrame()
for i in range(len(dist_name)):
if i==0:
df['x_coord']=random.sample(range(1, 10000), 1000)
df['y_coord']=random.sample(range(1, 10000), 1000)
df['segment']=dist_name[i]
else:
tmp=pd.DataFrame()
tmp['x_coord']=random.sample(range(1, 10000), 1000)
tmp['y_coord']=random.sample(range(1, 10000), 1000)
tmp['segment']=dist_name[i]
df=df.append(tmp)
start_time = time.time()
for i in range(len(dist_name)):
a=df[df['segment']==dist_name[i]]
tree = spatial.cKDTree(a[['x_coord','y_coord']])
print("--- %s seconds ---" % (time.time() - start_time))
--- 0.0312347412109375 seconds --- -0.0312347412109375秒-
def func(name):
a = df[df['segment'] == name]
return spatial.cKDTree(a[['x_coord','y_coord']])
pool = ThreadPool(4)
start_time = time.time()
tree = pool.map(func, dist_name)
print("--- %s seconds ---" % (time.time() - start_time))
--- 0.031250953674316406 seconds --- -0.031250953674316406秒-
Your code: 您的代码:
dist_name=df['segment'].unique()
for i in range(len(dist_name)):
a=df[df['segment']==dist_name[i]]
tree[i] = spatial.cKDTree(a[['x_coord','y_coord']])
Needs to be transformed into: 需要转化为:
dist_name=df['segment'].unique()
def func(name):
a = df[df['segment'] == name]
return spatial.cKDTree(a[['x_coord','y_coord']])
And your call to pool.map
: 然后您调用
pool.map
:
pool = ThreadPool(4)
tree = pool.map(func, dist_name)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.