[英]How to store all the output before multiprocessing finish?
I want to run multiprocess in python.我想在 python 中运行多进程。 Here is an example:
这是一个例子:
def myFunction(name,age):
output = paste(name,age)
return output
names = ["A","B","C"]
ages = ["1","2","3"]
with mp.Pool(processes=no_cpus) as pool:
results = pool.starmap(myFunction,zip(names,ages))
results_table = pd.concat(results)
results_table.to_csv(file,sep="\t",index=False)
myFunction
in the real case takes really long time.真实案例中的
myFunction
需要很长时间。 Sometime I have to interupt the running and start again.有时我必须中断运行并重新开始。 However the
results
will only be written to the output file when all pool.starmap
is done.然而,只有在完成所有
pool.starmap
后, results
才会写入 output 文件。 How can I store the intermediate/cache result before it's finished?如何在完成之前存储中间/缓存结果? I don't want to change myFunction from
return
to .to_csv()
我不想将 myFunction 从
return
更改为.to_csv()
Thanks!谢谢!
Instead of using map
, use method imap
, which returns an iterator that when iterated gives each result one by one as they become available (ie returned by my_function
).代替使用
map
,使用方法imap
,它返回一个迭代器,当迭代时,每个结果在它们可用时一一给出(即由my_function
返回)。 However, the results must still be returned in order.但是,结果仍然必须按顺序返回。 If you do not care about the order, than use
imap_unordered
.如果您不关心订单,请使用
imap_unordered
。
As each dataframe is returned and iterated, it is converted to a CSV file and outputted either with or without a header according to whether it is the first result being processed.当每个 dataframe 被返回和迭代时,它被转换为 CSV 文件,并根据是否是正在处理的第一个结果输出带有或不带有 header 的文件。
import pandas as pd
import multiprocessing as mp
def paste(name, age):
return pd.DataFrame([[name, age]], columns=['Name', 'Age'])
def myFunction(t):
name, age = t # unpack passed tuple
output = paste(name, age)
return output
# Required for Windows:
if __name__ == '__main__':
names = ["A","B","C"]
ages = ["1","2","3"]
no_cpus = min(len(names), mp.cpu_count())
csv_file = 'test.txt'
with mp.Pool(processes=no_cpus) as pool:
# Results from imap must be iterated
for index, result in enumerate(pool.imap(myFunction, zip(names,ages))):
if index == 0:
# First return value
header = True
open_flags = "w"
else:
header = False
open_flags = "a"
with open(csv_file, open_flags, newline='') as f:
result.to_csv(f, sep="\t", index=False, header=header)
Output of test.txt : test.txt的 Output :
Name Age
A 1
B 2
C 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.