简体   繁体   English

如何在多处理完成之前存储所有 output?

[英]How to store all the output before multiprocessing finish?

I want to run multiprocess in python.我想在 python 中运行多进程。 Here is an example:这是一个例子:

def myFunction(name,age):
     output = paste(name,age)
     return output

names = ["A","B","C"]
ages = ["1","2","3"]

with mp.Pool(processes=no_cpus) as pool:
    results = pool.starmap(myFunction,zip(names,ages))

results_table = pd.concat(results)
results_table.to_csv(file,sep="\t",index=False)

myFunction in the real case takes really long time.真实案例中的myFunction需要很长时间。 Sometime I have to interupt the running and start again.有时我必须中断运行并重新开始。 However the results will only be written to the output file when all pool.starmap is done.然而,只有在完成所有pool.starmap后, results才会写入 output 文件。 How can I store the intermediate/cache result before it's finished?如何在完成之前存储中间/缓存结果? I don't want to change myFunction from return to .to_csv()我不想将 myFunction 从return更改为.to_csv()

Thanks!谢谢!

Instead of using map , use method imap , which returns an iterator that when iterated gives each result one by one as they become available (ie returned by my_function ).代替使用map ,使用方法imap ,它返回一个迭代器,当迭代时,每个结果在它们可用时一一给出(即由my_function返回)。 However, the results must still be returned in order.但是,结果仍然必须按顺序返回。 If you do not care about the order, than use imap_unordered .如果您不关心订单,请使用imap_unordered

As each dataframe is returned and iterated, it is converted to a CSV file and outputted either with or without a header according to whether it is the first result being processed.当每个 dataframe 被返回和迭代时,它被转换为 CSV 文件,并根据是否是正在处理的第一个结果输出带有或不带有 header 的文件。

import pandas as pd
import multiprocessing as mp

def paste(name, age):
    return pd.DataFrame([[name, age]], columns=['Name', 'Age'])

def myFunction(t):
    name, age = t # unpack passed tuple
    output = paste(name, age)
    return output

# Required for Windows:
if __name__ == '__main__':
    names = ["A","B","C"]
    ages = ["1","2","3"]

    no_cpus = min(len(names), mp.cpu_count())

    csv_file = 'test.txt'

    with mp.Pool(processes=no_cpus) as pool:
        # Results from imap must be iterated
        for index, result in enumerate(pool.imap(myFunction, zip(names,ages))):
            if index == 0:
                # First return value
                header = True
                open_flags = "w"
            else:
                header = False
                open_flags = "a"
            with open(csv_file, open_flags, newline='') as f:
                result.to_csv(f, sep="\t", index=False, header=header)

Output of test.txt : test.txt的 Output :

Name    Age
A       1
B       2
C       3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM