简体   繁体   English

使用多重处理从数据帧写入csv而不弄乱输出

[英]Writing to csv from dataframe using multiprocessing and not messing up the output

import numpy as np
import pandas as pd
from multiprocessing import Pool
import threading

#Load the data
df = pd.read_csv('crsp_short.csv', low_memory=False)

def funk(date):
    ...
    # for each date in df.date.unique() do stuff which gives sample dataframe
    # as an output
    #then write it to file

    sample.to_csv('crsp_full.csv', mode='a')

def evaluation(f_list):
    with futures.ProcessPoolExecutor() as pool:
        return pool.map(funk, f_list)

# list_s is a list of dates I want to calculate function funk for   

evaluation(list_s)

I get a csv file as an output with some of the lines messed up because python is writing some pieces from different threads at the same time. 我得到一个csv文件作为输出,其中有些行被弄乱了,因为python同时从不同的线程编写了一些代码。 I guess I need to use Queues, but I was not able to modify the code so that it worked. 我想我需要使用队列,但是我无法修改代码以使其正常工作。 Ideas how to do it?Otherwise it takes ages to get the results. 如何做到这一点的想法,否则要花很多时间才能获得结果。

That solved the problem (Pool does the queue for you) 这样就解决了问题(游泳池由您代管)

Python: Writing to a single file with queue while using multiprocessing Pool Python:使用多处理池时,写入具有队列的单个文件

My version of the code that didn't mess up the output csv file: 我的代码版本不会弄乱输出的csv文件:

import numpy as np
import pandas as pd
from multiprocessing import Pool
import threading

#Load the data
df = pd.read_csv('crsp_short.csv', low_memory=False)

def funk(date):
    ...
    # for each date in df.date.unique() do stuff which gives sample dataframe
    # as an output

    return sample

# list_s is a list of dates I want to calculate function funk for   

def mp_handler():
# 28 is a number of processes I want to run
    p = multiprocessing.Pool(28)
    for result in p.imap(funk, list_s):
        result.to_csv('crsp_full.csv', mode='a')


if __name__=='__main__':
    mp_handler()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM