[英]How do I save the result of a parallel process from joblib in Python?
I started using joblib to paralleize some long for loops I have and it always just prints out the resulting array while I want to save it in a variable and use it in another part of the code.我开始使用 joblib 来并行化我拥有的一些长 for 循环,它总是只打印出结果数组,而我想将它保存在一个变量中并在代码的另一部分中使用它。 Here's my function and the parallelization, how can I store the resulting array?这是我的 function 和并行化,如何存储结果数组? The documentation is not too clear for me.文档对我来说不太清楚。
def CurrentStepbyStep(gap1, gap2, R, runTime, steps, Vdc, Vac, omegaAC, k,):
time = runTime
volt = Vdc + Vac*np.cos(k*omegaAC*time/steps)
phasediff = (2*e*Vdc*k*time/(hbar*steps)) + (2*e*Vac/(hbar*omegaAC))*np.sin(k*omegaAC*time/steps)
ij1 = (np.real(j2(gap1,gap1, (e*volt/hbar) ,R)))
ij2 = (- np.imag(j2(gap1,gap1, (e*volt/hbar) ,R)))
iqp = ( np.imag(j1(gap1,gap1, (e*volt/hbar) ,R)))
sineCurr = ij1*np.sin(phasediff)
cosineCurr = ij2*np.cos(phasediff)
totCurr = sineCurr + cosineCurr + iqp
return [volt,phasediff,ij1,ij2,iqp,sineCurr,cosineCurr,totCurr]
n=5
xsize = 50
steps = 100
Vdc_arr = np.linspace(-n*V_dc,n*V_dc, xsize)
backend = 'loky'
run = Parallel(n_jobs=4,backend = backend)(delayed(CurrentStepbyStep)(gapvar,gapvar2,Rn,(10**(-6)),steps,x,V_ac,omega_ac,i) for i in range(steps) for x in Vdc_arr)
Q : " ... how can I store the resulting array?"问:“ ......我怎样才能存储结果数组?”
Fact #0:事实#0:
your function bears a lot of inefficiencies and may easily (depending on actual computing costs of hidden j1(), j2()
functions ) represent almost an Amdahl's Law antipattern (you happen to pay way more on each and all overhead setup / communications / termination costs than you ever could receive back from a 4-process joblib.Parallel()(delayed())
code execution fragment).您的 function 效率低下,并且可能很容易(取决于隐藏的j1(), j2()
函数的实际计算成本)几乎代表了阿姆达尔定律的反模式(您碰巧在每个和所有开销设置/通信/终止上支付更多费用成本比您从 4 进程joblib.Parallel()(delayed())
代码执行片段中收到的回馈成本还要高。 Actual add-on costs are our main enemy in the HPC / low-latency computing problems.实际的附加成本是我们在 HPC / 低延迟计算问题中的主要敌人。
A wished to get speedup can actually pretty soon become a slowdown.希望获得加速实际上很快就会变成减速。
Fact #1:事实 #1:
your function does not return any array it returns a list
-instance :您的 function不返回任何数组,它返回一个list
-instance :
Proof:证明:
def thisReturnsLIST( a, b, c, d ):
return [ a, b, c, d ]
from joblib import Parallel, delayed
showMeResults = Parallel( n_jobs = 2 )( delayed( thisReturnsLIST
) ( "THE a",
"THE b",
c,
d
) for d in range( 5 )
for c in range( -5, 0, 1 )
)
>>> type( showMeResults )
<type 'list'>
Next, inspect this list
-instance, well actually a list_of_lists
received:接下来,检查这个list
-instance,实际上收到了一个list_of_lists
:
>>> for res in showMeResults:
... print res
...
['THE a', 'THE b', -5, 0]
['THE a', 'THE b', -4, 0]
['THE a', 'THE b', -3, 0]
['THE a', 'THE b', -2, 0]
['THE a', 'THE b', -1, 0]
['THE a', 'THE b', -5, 1]
...
['THE a', 'THE b', -2, 3]
['THE a', 'THE b', -1, 3]
['THE a', 'THE b', -5, 4]
['THE a', 'THE b', -4, 4]
['THE a', 'THE b', -3, 4]
['THE a', 'THE b', -2, 4]
['THE a', 'THE b', -1, 4]
>>>
Q : "can I store the results in array?"问: “我可以将结果存储在数组中吗?”
Sure,当然,
either refactor the code, so as to put all individual list
-items straight into one array ( using a standard numpy.array( fromList_or_Array_like_INSTANCE, dtype = None, copy = True, order = 'K', subok = False, ndmin = 0)
as a last resort for doing this ),要么重构代码,以便将所有单独的list
项直接放入一个数组中(使用标准numpy.array( fromList_or_Array_like_INSTANCE, dtype = None, copy = True, order = 'K', subok = False, ndmin = 0)
作为这样做的最后手段),
or或者
keep returning a list
-instance and post-process the joblib.Parallel()(delayed())
collected list_of_lists as was demonstrated above.继续返回一个list
-instance 并对joblib.Parallel()(delayed())
收集的list_of_lists进行后处理,如上所示。
( as-was ) state, before any performance motivated re-factoring: (原样)state,在任何性能驱动重构之前:
def CurrentStepbyStep( gap1, gap2, R, runTime, steps, Vdc, Vac, omegaAC, k ):
time = runTime
volt = Vdc + Vac * np.cos( k * omegaAC * time / steps )
phasediff = ( ( 2 * e * Vdc * k * time / ( hbar * steps ) )
+
( 2 * e * Vac / ( hbar * omegaAC ) )
* np.sin( k * omegaAC * time / steps )
)
ij1 = ( np.real( j2( gap1, gap1, ( e * volt / hbar ), R ) ) )
ij2 = (- np.imag( j2( gap1, gap1, ( e * volt / hbar ), R ) ) )
iqp = ( np.imag( j1( gap1, gap1, ( e * volt / hbar ), R ) ) )
sineCurr = ij1 * np.sin( phasediff )
cosineCurr = ij2 * np.cos( phasediff )
totCurr = ( sineCurr
+ cosineCurr
+ iqp
)
return [ volt, phasediff, ij1, ij2, iqp, sineCurr, cosineCurr, totCurr ]
n = 5
xsize = 50
steps = 100
Vdc_arr = np.linspace( -n * V_dc,
n * V_dc, xsize )
run = Parallel( n_jobs = 4,
backend = backend
)( delayed( CurrentStepbyStep
)( gapvar,
gapvar2,
Rn,
1E-6, # ( 10**( -6 ) ),
steps,
x,
V_ac,
omega_ac,
i
) for i in range( steps )
for x in Vdc_arr
)
produces this to-be-interpreted code for Python processes:为 Python 进程生成这个待解释的代码:
>>> dis.dis( CurrentStepbyStep )
2 0 LOAD_FAST 3 (runTime)
3 STORE_FAST 9 (time)
3 6 LOAD_FAST 5 (Vdc)
9 LOAD_FAST 6 (Vac)
12 LOAD_GLOBAL 0 (np)
15 LOAD_ATTR 1 (cos)
18 LOAD_FAST 8 (k)
21 LOAD_FAST 7 (omegaAC)
24 BINARY_MULTIPLY
25 LOAD_FAST 9 (time)
28 BINARY_MULTIPLY
29 LOAD_FAST 4 (steps)
32 BINARY_DIVIDE
33 CALL_FUNCTION 1
36 BINARY_MULTIPLY
37 BINARY_ADD
38 STORE_FAST 10 (volt)
4 41 LOAD_CONST 1 (2)
44 LOAD_GLOBAL 2 (e)
47 BINARY_MULTIPLY
48 LOAD_FAST 5 (Vdc)
51 BINARY_MULTIPLY
52 LOAD_FAST 8 (k)
55 BINARY_MULTIPLY
56 LOAD_FAST 9 (time)
59 BINARY_MULTIPLY
60 LOAD_GLOBAL 3 (hbar)
63 LOAD_FAST 4 (steps)
66 BINARY_MULTIPLY
67 BINARY_DIVIDE
6 68 LOAD_CONST 1 (2)
71 LOAD_GLOBAL 2 (e)
74 BINARY_MULTIPLY
75 LOAD_FAST 6 (Vac)
78 BINARY_MULTIPLY
79 LOAD_GLOBAL 3 (hbar)
82 LOAD_FAST 7 (omegaAC)
85 BINARY_MULTIPLY
86 BINARY_DIVIDE
7 87 LOAD_GLOBAL 0 (np)
90 LOAD_ATTR 4 (sin)
93 LOAD_FAST 8 (k)
96 LOAD_FAST 7 (omegaAC)
99 BINARY_MULTIPLY
100 LOAD_FAST 9 (time)
103 BINARY_MULTIPLY
104 LOAD_FAST 4 (steps)
107 BINARY_DIVIDE
108 CALL_FUNCTION 1
111 BINARY_MULTIPLY
112 BINARY_ADD
113 STORE_FAST 11 (phasediff)
9 116 LOAD_GLOBAL 0 (np)
119 LOAD_ATTR 5 (real)
122 LOAD_GLOBAL 6 (j2)
125 LOAD_FAST 0 (gap1)
128 LOAD_FAST 0 (gap1)
131 LOAD_GLOBAL 2 (e)
134 LOAD_FAST 10 (volt)
137 BINARY_MULTIPLY
138 LOAD_GLOBAL 3 (hbar)
141 BINARY_DIVIDE
142 LOAD_FAST 2 (R)
145 CALL_FUNCTION 4
148 CALL_FUNCTION 1
151 STORE_FAST 12 (ij1)
10 154 LOAD_GLOBAL 0 (np)
157 LOAD_ATTR 7 (imag)
160 LOAD_GLOBAL 6 (j2)
163 LOAD_FAST 0 (gap1)
166 LOAD_FAST 0 (gap1)
169 LOAD_GLOBAL 2 (e)
172 LOAD_FAST 10 (volt)
175 BINARY_MULTIPLY
176 LOAD_GLOBAL 3 (hbar)
179 BINARY_DIVIDE
180 LOAD_FAST 2 (R)
183 CALL_FUNCTION 4
186 CALL_FUNCTION 1
189 UNARY_NEGATIVE
190 STORE_FAST 13 (ij2)
11 193 LOAD_GLOBAL 0 (np)
196 LOAD_ATTR 7 (imag)
199 LOAD_GLOBAL 8 (j1)
202 LOAD_FAST 0 (gap1)
205 LOAD_FAST 0 (gap1)
208 LOAD_GLOBAL 2 (e)
211 LOAD_FAST 10 (volt)
214 BINARY_MULTIPLY
215 LOAD_GLOBAL 3 (hbar)
218 BINARY_DIVIDE
219 LOAD_FAST 2 (R)
222 CALL_FUNCTION 4
225 CALL_FUNCTION 1
228 STORE_FAST 14 (iqp)
12 231 LOAD_FAST 12 (ij1)
234 LOAD_GLOBAL 0 (np)
237 LOAD_ATTR 4 (sin)
240 LOAD_FAST 11 (phasediff)
243 CALL_FUNCTION 1
246 BINARY_MULTIPLY
247 STORE_FAST 15 (sineCurr)
13 250 LOAD_FAST 13 (ij2)
253 LOAD_GLOBAL 0 (np)
256 LOAD_ATTR 1 (cos)
259 LOAD_FAST 11 (phasediff)
262 CALL_FUNCTION 1
265 BINARY_MULTIPLY
266 STORE_FAST 16 (cosineCurr)
16 269 LOAD_FAST 15 (sineCurr)
272 LOAD_FAST 16 (cosineCurr)
275 BINARY_ADD
276 LOAD_FAST 14 (iqp)
279 BINARY_ADD
280 STORE_FAST 17 (totCurr)
18 283 LOAD_FAST 10 (volt)
286 LOAD_FAST 11 (phasediff)
289 LOAD_FAST 12 (ij1)
292 LOAD_FAST 13 (ij2)
295 LOAD_FAST 14 (iqp)
298 LOAD_FAST 15 (sineCurr)
301 LOAD_FAST 16 (cosineCurr)
304 LOAD_FAST 17 (totCurr)
307 BUILD_LIST 8
310 RETURN_VALUE
>>>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.