[英]How do I get multiprocessing results?
I would like to store the result of the work in a specific variable after multiprocessing as shown below.我想在多处理后将工作结果存储在特定变量中,如下所示。
Alternatively, I want to save the results of the job as a csv file.或者,我想将作业结果保存为 csv 文件。 May I know how to do it?我可以知道怎么做吗?
This is my code:这是我的代码:
(I want to get 'df4' and 'df7' data and to save csv file) (我想获取 'df4' 和 'df7' 数据并保存 csv 文件)
import pandas as pd
from pandas import DataFrame
import time
import multiprocessing
df2 = pd.DataFrame()
df3 = pd.DataFrame()
df4 = pd.DataFrame()
df5 = pd.DataFrame()
df6 = pd.DataFrame()
df7 = pd.DataFrame()
df8 = pd.DataFrame()
date = '2011-03', '2011-02' ........ '2021-03' #There are 120 list.
list1 = df1['resion'].drop_duplicates() # There are 20 list. 'df1' is original data
#I'd like to divide the list and work on it.
list11 = list1.iloc[0:10]
list12 = list1.iloc[10:20]
#It's a function using 'list11'.
def cal1():
global df2
global df3
global df4
start = time.time()
for i, t in enumerate(list11):
df2 = pd.DataFrame(df1[df1['resion'] == t]) #'df1' is original data
if i%2 == 0:
print ("cal1 function processing: ", i)
end = time.time()
print (end-start)
else:
pass
for n, d in enumerate(date):
df3 = pd.DataFrame(df2[df2['date'] == d])
df3['number'] = df3['price'].rank(pct=True, ascending = False )
df4 = df4.append(pd.DataFrame(df3))
return df4
#It's a function using 'list12'.
def cal2():
global df5
global df6
global df7
start = time.time()
for i, t in enumerate(list12):
df5 = pd.DataFrame(df1[df1['resion'] == t]) #'df1' is original data
if i%2 == 0:
print ("cal1 function processing: ", i)
end = time.time()
print (end-start)
else:
pass
for n, d in enumerate(date):
df6 = pd.DataFrame(df5[df5['date'] == d])
df6['number'] = df6['price'].rank(pct=True, ascending = False )
df7 = df7.append(pd.DataFrame(df6))
return df7
## Multiprocessing code
if __name__ == "__main__":
# creating processes
p1 = multiprocessing.Process(target=cal1, args=())
p2 = multiprocessing.Process(target=cal2, args=())
# starting process 1
p1.start()
# starting process 2
p2.start()
# wait until process 1 is finished
p1.join()
# wait until process 2 is finished
p2.join()
# both processes finished
print("Done!")
It looks like your functions cal1
and cal2
are identical except that they are trying to assign results to some different global variables.看起来您的函数cal1
和cal2
是相同的,只是它们试图将结果分配给一些不同的全局变量。 This is not going to work, because when you run them in a subprocess, they will assign that global variable in the subprocess, but that will have no impact whatsoever on the main process from which you started them.这是行不通的,因为当您在子进程中运行它们时,它们将在子进程中分配该全局变量,但这对您启动它们的主进程没有任何影响。
If you want to map a function to multiple input ranges across multiple processes you can use a process Pool and Pool.map .如果您想 map 和 function 跨多个进程的多个输入范围,您可以使用进程Pool和Pool.map 。
For example:例如:
def cal(input_list):
start = time.time()
for i, t in enumerate(input_list):
df2 = pd.DataFrame(df1[df1['resion'] == t]) #'df1' is original data
if i%2 == 0:
print ("cal1 function processing: ", i)
end = time.time()
print (end-start)
else:
pass
for n, d in enumerate(date):
df3 = pd.DataFrame(df2[df2['date'] == d])
df3['number'] = df3['price'].rank(pct=True, ascending = False )
df4 = df4.append(pd.DataFrame(df3))
# I kept your original code unmodified but I'm not really sure this
# is what to do, because you are returning after one pass through the
# outer loop. I haven't scrutinized what you are actually trying to
# do but I suspect this is wrong too.
return df4
Then create a process pool and you can divide up the input how you want (or, with a bit of tweaking, you can let Pool.map
chunk the input for you, and then reduce the outputs from map into a single output):然后创建一个进程池,您可以根据需要划分输入(或者,通过一些调整,您可以让Pool.map
为您分块输入,然后将 map 的输出减少为单个输出):
pool = multiprocessing.Pool(2)
dfs = pool.map(cal, [list1.iloc[0:10], list1.iloc[10:20]])
This is just to get you started.这只是为了让你开始。 I would probably do a number of other things differently as well.我可能也会做一些不同的其他事情。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.