简体   繁体   English


[英]Python: Multiprocessing output issues

I am running the following code using multiprocessing. 我正在使用多处理程序运行以下代码。 It is all working fine, except for the output which seems to be lesser than what it should be. 一切正常,除了输出似乎比应有的小。 I have presented a self contained example below. 我在下面提供了一个自包含的示例。

import pandas as pd
import multiprocessing
from multiprocessing import Pool, cpu_count
from functools import partial
import timeit
import numpy as np

prng = 1234
cpu_cores = cpu_count()-1

temp_df1 = pd.DataFrame({'trip_id':[22186702,22186703,22186704,26777219,26777220,26777221,26777222,26777223],

Second, the time sampling datafame and function to run in multiprocessing 二,时间采样数据的名声和在多处理中运行的功能

time_dist = pd.DataFrame({'Time':[8,9,10,11,12,13,14],

results_frow = []
result_list_final = []

def func(df, time_dist_df):

    for i in range(0, df.shape[0]):
        if i == 0:
            start_time = df['start_time'].iloc[i]
            arrival_time = df['arrival_time'].iloc[i]
            tour_id = df['tour_id'].iloc[i]

            tour_id = df['tour_id'].iloc[i]
            arrival_time_prev = results_frow[-2]
            time_dist1 = time_dist.loc[time_dist['Time'] >= arrival_time_prev]
            weight_column = df['weight_column'].iloc[i]

            # sample a time and calculate a new arrival time as a result
            if len(time_dist1) > 0:
                start_time = time_dist1.sample(n=1, weights=time_dist1[weight_column], replace=True, random_state=prng)
                start_time = start_time[['Time']].values  ###
                start_time = start_time[0][0]    
                start_time = results_frow[-2]

            newarrival_time = start_time + df['ttime_mins'].iloc[i] / 60

    return results_frow

Now run multiprocessing and collect the results 现在运行多重处理并收集结果

def collect_results(result_list):
    return pd.DataFrame({'start_time': result_list[0::3],
                  'arrival_time': result_list[1::3],
                  'tour_id': result_list[2::3]})

# create list of grouped dataframes
grplist = []
for name, group in temp_df1.groupby('tour_id'):

# use partial to fix the second argument in the function so that multiprocessing does not have an issue
func_partial = partial(func, time_dist_df = time_dist)

if __name__ == '__main__':
    start = timeit.default_timer()
    pool = multiprocessing.Pool(processes=cpu_cores)
    result_list = pool.map(func_partial, grplist)
    result_list_final = result_list[1]

    results_df = collect_results(result_list_final) #### Here lies the problem. Instead of getting back 8 rows, I am only getting back 5 i.e. the last group in the grplist
    stop = timeit.default_timer()
    total_time = stop - start
    print("It took a total of %sec" %total_time)
    results_df.to_csv(r"c:/stimes_parallelized.csv", index=False)


The issue lies at results_df in the multiprocessing code block. 问题出在多处理代码块中的results_df上。 It only returns the results for the last group (5 rows) instead of both groups or 8 rows. 它仅返回最后一组(5行)的结果,而不是两组或8行的结果。 If I go into debug mode in Pycharm, I see all 8 rows in the results_df, but not so when I save the file out as a csv. 如果我在Pycharm中进入调试模式,则在results_df中会看到所有8行,但是当我将文件另存为csv时却不会。

you have multi issues here : 您在这里遇到多个问题:

1) why you import multiproccessing while you imported pool from it: 1)为什么从池中导入池时导入多进程:

from multiprocessing import Pool, cpu_count

that mean you imported multiprocessing 2 times and pool too: 这意味着您导入了两次多重处理并也合并了:

import multiprocessing

so replace : 所以替换:

pool = multiprocessing.Pool(processes=cpu_cores)

by: 通过:

pool = Pool(processes=cpu_cores)

2) 2)

you didn't spécified the version of you'r python in déscription , how you want we answer while we don't know which version to use 您没有在déscription中指定python版本的详细信息,但是我们不知道要使用哪个版本时希望如何回答

3) 3)

i think to solve this probléme , you must import freeze_support from multiprocessing 我认为要解决此问题,必须从多处理中导入Frozen_support

from multiprocessing import freeze_support

the use it like that : 这样使用:

if __name__ == '__main__':

i said (i think) because i alerady had this probléme in the past , and you didn't spécified the python version to use 我说(我认为)是因为我过去一直有这个问题,而您并未指定要使用的python版本

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM