简体   繁体   English

在 Python 中并行执行 function

[英]Parallel execution of a function in Python

Below is parallel code.下面是并行代码。 When I run I get the error - UnboundLocalError: local variable 'df_project' referenced before assignment I dont know what Im doing wrong.当我运行时出现错误 - UnboundLocalError: local variable 'df_project' referenced before assignment我不知道我做错了什么。 When I run the same code as a regular function it works fine.当我运行与常规 function 相同的代码时,它工作正常。

Any inputs will be of great help任何输入都会有很大帮助

from multiprocessing import Pool

def square(x):
    # calculate the square of the value of x
    v=x['content'][0]['template']['module']
    if isinstance(v, list):
        for i, v2 in enumerate(v): 
            df_project,df_module,df_module_header,df_module_paragraph,df_module_image,df_module_chart,df_module_chart_row,df_module_list = normalizeJSON(x,i,v2['id'])
    else:
        print('module is not a list')

    return df_project,df_module,df_module_header,df_module_paragraph,df_module_image,df_module_chart,df_module_chart_row,df_module_list

if __name__ == '__main__':

    # Define the dataset
    dataset = result_list

    # Run this with a pool of 5 agents having a chunksize of 3 until finished
    agents = 5
    chunksize = 3
    project=pd.DataFrame()
    module=pd.DataFrame()
    content_module_columns=["module_id","module_text", "project_id", "project_revision","index"]  
    dim_content_module=pd.DataFrame(columns = content_module_columns)

    with Pool(processes=agents) as pool:
         project,module, module_header,module_paragraph,module_image,module_chart,module_chart_row,module_list=pool.map(square,dataset,chunksize)


Below is the normal (serial) version of the code that Im trying to parallelize 

    for index in range(len(result_list)):
        print('processing file number:', index)
        d=result_list[index]
        v=d['content'][0]['template']['module']
        if isinstance(v, list):
           for i, v2 in enumerate(v): 
               df_project,df_module = normalizeJSON(d,i,v2['id'])
               dim_content_module=dim_content_module.append(df_module, 
                                  ignore_index=True,sort=False)
        else:
            print('module is not a list')

I dont get any error in the serial version with the same input. 

*result_list* is a list of dictionaries

Hard to guess without seeing full stack trace but most probably your problem is with this function:没有看到完整的堆栈跟踪很难猜测,但很可能你的问题出在这个 function 上:


def square(x):
    # calculate the square of the value of x
    v=x['content'][0]['template']['module']
    if isinstance(v, list):
        for i, v2 in enumerate(v): 
            df_project,df_module,df_module_header,df_module_paragraph,df_module_image,df_module_chart,df_module_chart_row,df_module_list = normalizeJSON(x,i,v2['id'])
    else:
        print('module is not a list')

    return df_project,df_module,df_module_header,df_module_paragraph,df_module_image,df_module_chart,df_module_chart_row,df_module_list

Most likely that isinstance check returns False in one of the runs and when in return you're trying to use all these objects it throws UnboundLocalError because in fact you're referencing variables that were never assigned isinstance check 很可能在其中一次运行中return False ,而当您尝试使用所有这些对象时,它会抛出UnboundLocalError ,因为实际上您引用的变量从未分配过

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM