使用多處理嵌套並行進程

Question

有沒有辦法在已經並行化的 function 中並行運行 function？ 我知道使用 multiprocessing.Pool() 這是不可能的，因為守護進程無法創建子進程。 我對並行計算相當陌生，並且正在努力尋找解決方法。

我目前有數千個計算需要使用我與之交互的其他一些市售量子力學代碼並行運行。 每一次計算，在父計算正常終止時，都有三個后續計算需要並行執行，如果父計算沒有正常終止，則該點計算結束。 我總是可以將這三個后續計算組合成一個大計算並正常運行 - 盡管我更喜歡單獨並行運行。

Main 當前看起來像這樣， run()是首先並行運行一系列點的父計算， par_nacmes()是 function，我想在父正常終止后並行運行三個子計算。

  def par_nacmes(nacme_input_data):                                                                                                                                                                                                                                                        
      nacme_dir, nacme_input, index = nacme_input_data  # Unpack info in tuple for the calculation                                                                                                                                                                                                                                    
      axes_index = get_axis_index(nacme_input)                                                                                                                                                                                                                                             
      [norm_term, nacme_outf] = util.run_calculation(molpro_keys, pwd, nacme_dir, nacme_input, index)  # Submit child calculation                                                                                                                                                                                      
      if norm_term:                                                                                                                                                                                                                                                                        
          data.extract_nacme(nacme_outf, molpro_keys['nacme_regex'], index, axes_index)                                                                                                                                                                                                    
      else:                                                                                                                                                                                                                                                                                
          with open('output.log', 'w+') as f:                                                                                                                                                                                                                                              
              f.write('NACME Crashed for GP%s - axis %s' % (index, axes_index))                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                           
  def run(grid_point):                                                                                                                                                                                                                                                                     
      index, geom = grid_point                                                                                                                                                                                                                                                             
      if inputs['code'] == 'molpro':                                                                                                                                                                                                                                                       
          [spe_dir, spe_input] = molpro.setup_spe(inputs, geom, pwd, index)                                                                                                                                                                                                                
          [norm_term, spe_outf] = util.run_calculation(molpro_keys, pwd, spe_dir, spe_input, index)  # Run each parent calculation                                                                                                                                                                                        
          if norm_term:  # If parent calculation terminates normally - Extract data and continue with subsequent calculations for each point                                                                                                                                                                                                                                                                   
              data.extract_energies(spe_dir+spe_outf, inputs['spe'], molpro_keys['energy_regex'],                                                                                                                                                                                          
                                    molpro_keys['cas_prog'], index)                                                                                                                                                                                                                        
              if inputs['nacme'] == 'yes':                                                                                                                                                                                                                                                 
                  [nacme_dir, nacmes_inputs] = molpro.setup_nacme(inputs, geom, spe_dir, index)                                                                                                                                                                                                                                                                                                                                                                                                      
                  nacmes_data = [(nacme_dir, nacme_inp, index) for nacme_inp in nacmes_inputs] # List of three tuples - each with three elements. Each tuple describes a child calculation to be run in parallel                                                                                                                                                                                             
                  nacme_pool = multiprocessing.Pool()                                                                                                                                                                                                                                      
                  nacme_pool.map(par_nacmes, [nacme_input for nacme_input in nacmes_data]) # Run each calculation in list of tuples in parallel                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                           
              if inputs['grad'] == 'yes':                                                                                                                                                                                                                                                  
                  pass                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                           
          else:                                                                                                                                                                                                                                                                            
              with open('output.log', 'w+') as f:                                                                                                                                                                                                                                          
                  f.write('SPE crashed for GP%s' % index)                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                                
      elif inputs['code'] == 'molcas':   # TO DO                                                                                                                                                                                                                                           
          pass                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                           
  if __name__ == "__main__":                                                                                                                                                                                                                                                               
      try:                                                                                                                                                                                                                                                                                 
          pwd = os.getcwd()  # parent dir                                                                                                                                                                                                                                                  
          f = open(inp_geom, 'r')                                                                                                                                                                                                                                                          
          ref_geom = np.genfromtxt(f, skip_header=2, usecols=(1, 2, 3), encoding=None)                                                                                                                                                                                                     
          f.close()                                                                                                                                                                                                                                                                        
          geom_list = coordinate_generator(ref_geom)  # Generate nuclear coordinates                                                                                                                                                                                                                                      
          if inputs['code'] == 'molpro':                                                                                                                                                                                                                                                   
              couplings = molpro.coupled_states(inputs['states'][-1])                                                                                                                                                                                                                      
          elif inputs['code'] == 'molcas':                                                                                                                                                                                                                                                 
              pass                                                                                                                                                                                                                                                                         
          data = setup.global_data(ref_geom, inputs['states'][-1], couplings, len(geom_list))                                                                                                                                                                                              
          run_pool = multiprocessing.Pool()                                                                                                                                                                                                                                                
          run_pool.map(run, [(k, v) for k, v in enumerate(geom_list)])  # Run each parent calculation for each set of coordinates                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                           
      except StopIteration:                                                                                                                                                                                                                                                                
          print('Please ensure goemetry file is correct.')

任何有關如何為每個點並行運行這些子計算的見解都會有很大幫助。 我看到有人建議改用多線程或將 daemon 設置為 false，盡管我不確定這是否是最好的方法。

Answer 1

首先我不知道為什么你必須並行運行 par_nacmes 但如果你必須這樣做，你可以：
a 使用線程而不是進程來運行它們，或者 b 使用 multiprocessing.Process 來運行 run 但是這會涉及很多開銷，所以我個人不會這樣做。

因為你所要做的就是更換

nacme_pool = multiprocessing.Pool()                                                                                                                                                                                                                                      
                  nacme_pool.map(par_nacmes, [nacme_input for nacme_input in nacmes_data])

在 run() 中

threads = []
for nacme_input in nacmes_data:
     t = Thread(target=par_nacmes, args=(nacme_input,)); t.start()
     threads.append(t)
for t in threads: t.join()

或者您是否不在乎踏板是否完成

for nacme_input in nacmes_data:
     t = Thread(target=par_nacmes, args=(nacme_input,)); t.start()

使用多處理嵌套並行進程

問題描述

1 個解決方案

解決方案1
0 2021-01-05 23:22:08

使用多處理嵌套並行進程

問題描述

1 個解決方案

解決方案1 0 2021-01-05 23:22:08

解決方案1
0 2021-01-05 23:22:08