简体   繁体   English

如何从 Python 中的 joblib 保存并行进程的结果?

[英]How do I save the result of a parallel process from joblib in Python?

I started using joblib to paralleize some long for loops I have and it always just prints out the resulting array while I want to save it in a variable and use it in another part of the code.我开始使用 joblib 来并行化我拥有的一些长 for 循环,它总是只打印出结果数组,而我想将它保存在一个变量中并在代码的另一部分中使用它。 Here's my function and the parallelization, how can I store the resulting array?这是我的 function 和并行化,如何存储结果数组? The documentation is not too clear for me.文档对我来说不太清楚。

def CurrentStepbyStep(gap1, gap2, R, runTime, steps, Vdc, Vac, omegaAC, k,):
    time = runTime
    volt = Vdc + Vac*np.cos(k*omegaAC*time/steps)
    phasediff = (2*e*Vdc*k*time/(hbar*steps)) + (2*e*Vac/(hbar*omegaAC))*np.sin(k*omegaAC*time/steps)
    ij1 = (np.real(j2(gap1,gap1, (e*volt/hbar) ,R)))
    ij2 = (- np.imag(j2(gap1,gap1, (e*volt/hbar) ,R)))
    iqp = ( np.imag(j1(gap1,gap1, (e*volt/hbar) ,R)))
    sineCurr = ij1*np.sin(phasediff)
    cosineCurr = ij2*np.cos(phasediff)
    totCurr = sineCurr + cosineCurr + iqp
    return [volt,phasediff,ij1,ij2,iqp,sineCurr,cosineCurr,totCurr]


n=5
xsize = 50
steps = 100
Vdc_arr = np.linspace(-n*V_dc,n*V_dc, xsize)
backend = 'loky'
run = Parallel(n_jobs=4,backend = backend)(delayed(CurrentStepbyStep)(gapvar,gapvar2,Rn,(10**(-6)),steps,x,V_ac,omega_ac,i) for i in range(steps) for x in Vdc_arr)

Q : " ... how can I store the resulting array?":“ ......我怎样才能存储结果数组?”


Fact #0:事实#0:
your function bears a lot of inefficiencies and may easily (depending on actual computing costs of hidden j1(), j2() functions ) represent almost an Amdahl's Law antipattern (you happen to pay way more on each and all overhead setup / communications / termination costs than you ever could receive back from a 4-process joblib.Parallel()(delayed()) code execution fragment).您的 function 效率低下,并且可能很容易(取决于隐藏的j1(), j2()函数的实际计算成本)几乎代表了阿姆达尔定律的反模式(您碰巧在每个和所有开销设置/通信/终止上支付更多费用成本比您从 4 进程joblib.Parallel()(delayed())代码执行片段中收到的回馈成本还要高。 Actual add-on costs are our main enemy in the HPC / low-latency computing problems.实际的附加成本是我们在 HPC / 低延迟计算问题中的主要敌人。

A wished to get speedup can actually pretty soon become a slowdown.希望获得加速实际上很快就会变成减速。


Fact #1:事实 #1:
your function does not return any array it returns a list -instance :您的 function返回任何数组,它返回一个list -instance

Proof:证明:

def thisReturnsLIST( a, b, c, d ):
        return     [ a, b, c, d ]

from joblib import Parallel, delayed

showMeResults = Parallel( n_jobs = 2 )( delayed( thisReturnsLIST 
                                                 ) ( "THE a",
                                                     "THE b",
                                                          c,
                                                          d
                                                          ) for d in range(  5 )
                                                            for c in range( -5, 0, 1 )
                                        )

>>> type( showMeResults )
<type 'list'>

QED量子点


Next, inspect this list -instance, well actually a list_of_lists received:接下来,检查这个list -instance,实际上收到了一个list_of_lists

>>> for       res in showMeResults:
...     print res
... 
['THE a', 'THE b', -5, 0]
['THE a', 'THE b', -4, 0]
['THE a', 'THE b', -3, 0]
['THE a', 'THE b', -2, 0]
['THE a', 'THE b', -1, 0]
['THE a', 'THE b', -5, 1]
...
['THE a', 'THE b', -2, 3]
['THE a', 'THE b', -1, 3]
['THE a', 'THE b', -5, 4]
['THE a', 'THE b', -4, 4]
['THE a', 'THE b', -3, 4]
['THE a', 'THE b', -2, 4]
['THE a', 'THE b', -1, 4]
>>> 

POSSIBLE SOLUTIONS:可能的解决方案:

Q : "can I store the results in array?"“我可以将结果存储在数组中吗?”

Sure,当然,
either refactor the code, so as to put all individual list -items straight into one array ( using a standard numpy.array( fromList_or_Array_like_INSTANCE, dtype = None, copy = True, order = 'K', subok = False, ndmin = 0) as a last resort for doing this ),要么重构代码,以便将所有单独的list项直接放入一个数组中(使用标准numpy.array( fromList_or_Array_like_INSTANCE, dtype = None, copy = True, order = 'K', subok = False, ndmin = 0)作为这样做的最后手段),
or或者
keep returning a list -instance and post-process the joblib.Parallel()(delayed()) collected list_of_lists as was demonstrated above.继续返回一个list -instance 并对joblib.Parallel()(delayed())收集的list_of_lists进行后处理,如上所示。

QED量子点


( as-was ) state, before any performance motivated re-factoring: (原样)state,在任何性能驱动重构之前:

def CurrentStepbyStep( gap1, gap2, R, runTime, steps, Vdc, Vac, omegaAC, k ):
    time       = runTime
    volt       = Vdc + Vac * np.cos( k * omegaAC * time / steps )
    phasediff  = ( ( 2 * e * Vdc * k * time / ( hbar * steps ) )
                   +
                   ( 2 * e * Vac / ( hbar * omegaAC ) )
                   * np.sin( k * omegaAC * time / steps )
                   )
    ij1        = (  np.real( j2( gap1, gap1, ( e * volt / hbar ), R ) ) )
    ij2        = (- np.imag( j2( gap1, gap1, ( e * volt / hbar ), R ) ) )
    iqp        = (  np.imag( j1( gap1, gap1, ( e * volt / hbar ), R ) ) )
    sineCurr   = ij1 * np.sin( phasediff )
    cosineCurr = ij2 * np.cos( phasediff )
    totCurr    = ( sineCurr
                 + cosineCurr
                 + iqp
                   )
    return [ volt, phasediff, ij1, ij2, iqp, sineCurr, cosineCurr, totCurr ]


n       =   5
xsize   =  50
steps   = 100
Vdc_arr = np.linspace( -n * V_dc,
                        n * V_dc, xsize )
run = Parallel( n_jobs  = 4,
                backend = backend
                )( delayed( CurrentStepbyStep
                            )( gapvar,
                               gapvar2,
                               Rn,
                               1E-6,  # ( 10**( -6 ) ),
                               steps,
                               x,
                               V_ac,
                               omega_ac,
                               i
                               ) for i in range( steps )
                                 for x in Vdc_arr
                  )

produces this to-be-interpreted code for Python processes:为 Python 进程生成这个待解释的代码:

>>> dis.dis( CurrentStepbyStep )
  2           0 LOAD_FAST                3 (runTime)
              3 STORE_FAST               9 (time)

  3           6 LOAD_FAST                5 (Vdc)
              9 LOAD_FAST                6 (Vac)
             12 LOAD_GLOBAL              0 (np)
             15 LOAD_ATTR                1 (cos)
             18 LOAD_FAST                8 (k)
             21 LOAD_FAST                7 (omegaAC)
             24 BINARY_MULTIPLY     
             25 LOAD_FAST                9 (time)
             28 BINARY_MULTIPLY     
             29 LOAD_FAST                4 (steps)
             32 BINARY_DIVIDE       
             33 CALL_FUNCTION            1
             36 BINARY_MULTIPLY     
             37 BINARY_ADD          
             38 STORE_FAST              10 (volt)

  4          41 LOAD_CONST               1 (2)
             44 LOAD_GLOBAL              2 (e)
             47 BINARY_MULTIPLY     
             48 LOAD_FAST                5 (Vdc)
             51 BINARY_MULTIPLY     
             52 LOAD_FAST                8 (k)
             55 BINARY_MULTIPLY     
             56 LOAD_FAST                9 (time)
             59 BINARY_MULTIPLY     
             60 LOAD_GLOBAL              3 (hbar)
             63 LOAD_FAST                4 (steps)
             66 BINARY_MULTIPLY     
             67 BINARY_DIVIDE       

  6          68 LOAD_CONST               1 (2)
             71 LOAD_GLOBAL              2 (e)
             74 BINARY_MULTIPLY     
             75 LOAD_FAST                6 (Vac)
             78 BINARY_MULTIPLY     
             79 LOAD_GLOBAL              3 (hbar)
             82 LOAD_FAST                7 (omegaAC)
             85 BINARY_MULTIPLY     
             86 BINARY_DIVIDE       

  7          87 LOAD_GLOBAL              0 (np)
             90 LOAD_ATTR                4 (sin)
             93 LOAD_FAST                8 (k)
             96 LOAD_FAST                7 (omegaAC)
             99 BINARY_MULTIPLY     
            100 LOAD_FAST                9 (time)
            103 BINARY_MULTIPLY     
            104 LOAD_FAST                4 (steps)
            107 BINARY_DIVIDE       
            108 CALL_FUNCTION            1
            111 BINARY_MULTIPLY     
            112 BINARY_ADD          
            113 STORE_FAST              11 (phasediff)

  9         116 LOAD_GLOBAL              0 (np)
            119 LOAD_ATTR                5 (real)
            122 LOAD_GLOBAL              6 (j2)
            125 LOAD_FAST                0 (gap1)
            128 LOAD_FAST                0 (gap1)
            131 LOAD_GLOBAL              2 (e)
            134 LOAD_FAST               10 (volt)
            137 BINARY_MULTIPLY     
            138 LOAD_GLOBAL              3 (hbar)
            141 BINARY_DIVIDE       
            142 LOAD_FAST                2 (R)
            145 CALL_FUNCTION            4
            148 CALL_FUNCTION            1
            151 STORE_FAST              12 (ij1)

 10         154 LOAD_GLOBAL              0 (np)
            157 LOAD_ATTR                7 (imag)
            160 LOAD_GLOBAL              6 (j2)
            163 LOAD_FAST                0 (gap1)
            166 LOAD_FAST                0 (gap1)
            169 LOAD_GLOBAL              2 (e)
            172 LOAD_FAST               10 (volt)
            175 BINARY_MULTIPLY     
            176 LOAD_GLOBAL              3 (hbar)
            179 BINARY_DIVIDE       
            180 LOAD_FAST                2 (R)
            183 CALL_FUNCTION            4
            186 CALL_FUNCTION            1
            189 UNARY_NEGATIVE      
            190 STORE_FAST              13 (ij2)

 11         193 LOAD_GLOBAL              0 (np)
            196 LOAD_ATTR                7 (imag)
            199 LOAD_GLOBAL              8 (j1)
            202 LOAD_FAST                0 (gap1)
            205 LOAD_FAST                0 (gap1)
            208 LOAD_GLOBAL              2 (e)
            211 LOAD_FAST               10 (volt)
            214 BINARY_MULTIPLY     
            215 LOAD_GLOBAL              3 (hbar)
            218 BINARY_DIVIDE       
            219 LOAD_FAST                2 (R)
            222 CALL_FUNCTION            4
            225 CALL_FUNCTION            1
            228 STORE_FAST              14 (iqp)

 12         231 LOAD_FAST               12 (ij1)
            234 LOAD_GLOBAL              0 (np)
            237 LOAD_ATTR                4 (sin)
            240 LOAD_FAST               11 (phasediff)
            243 CALL_FUNCTION            1
            246 BINARY_MULTIPLY     
            247 STORE_FAST              15 (sineCurr)

 13         250 LOAD_FAST               13 (ij2)
            253 LOAD_GLOBAL              0 (np)
            256 LOAD_ATTR                1 (cos)
            259 LOAD_FAST               11 (phasediff)
            262 CALL_FUNCTION            1
            265 BINARY_MULTIPLY     
            266 STORE_FAST              16 (cosineCurr)

 16         269 LOAD_FAST               15 (sineCurr)
            272 LOAD_FAST               16 (cosineCurr)
            275 BINARY_ADD          
            276 LOAD_FAST               14 (iqp)
            279 BINARY_ADD          
            280 STORE_FAST              17 (totCurr)

 18         283 LOAD_FAST               10 (volt)
            286 LOAD_FAST               11 (phasediff)
            289 LOAD_FAST               12 (ij1)
            292 LOAD_FAST               13 (ij2)
            295 LOAD_FAST               14 (iqp)
            298 LOAD_FAST               15 (sineCurr)
            301 LOAD_FAST               16 (cosineCurr)
            304 LOAD_FAST               17 (totCurr)
            307 BUILD_LIST               8
            310 RETURN_VALUE        
>>> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM