简体   繁体   中英

Python how to read from and write to different files using multiprocessing

I have several files and I would like to read those files, filter some keywords and write them into different files. I use Process() and it turns out that it takes more time to process the readwrite function. Do I need to separate the read and write to two functions? How I can read multiple files at one time and write key words in different files to different csv?

Thank you very much.

def readwritevalue():
    for file in gettxtpath():    ##gettxtpath will return a list of files
        file1=file+".csv"
        ##Identify some variable
##Read the file
        with open(file) as fp:
            for line in fp:
                #Process the data
                data1=xxx
                data2=xxx
                ....
         ##Write it to different files
        with open(file1,"w") as fp1
            print(data1,file=fp1 )
            w = csv.writer(fp1)
            writer.writerow(data2)
            ...
if __name__ == '__main__':
    p = Process(target=readwritevalue)
    t1 = time.time()
    p.start()
    p.join()

Want to edit my questions. I have more functions to modify the csv generated by the readwritevalue() functions. So, if Pool.map() is fine. Will it be ok to change all the remaining functions like this? However, it seems that it did not save much time for that.

def getFormated(file):  ##Merge each csv with a well-defined formatted csv and generate a final report with writing all the csv to one output csv

   csvMerge('Format.csv',file,file1)
   getResult()

if __name__=="__main__":
    pool=Pool(2)
    pool.map(readwritevalue,[file for file in gettxtpath()])
    pool.map(GetFormated,[file for file in getcsvName()])
    pool.map(Otherfunction,file_list)
    t1=time.time()
    pool.close()
    pool.join()

You can extract the body of the for loop into its own function, create a multiprocessing.Pool object , then call pool.map() like so (I've used more descriptive names):

import csv
import multiprocessing

def read_and_write_single_file(stem):
    data = None

    with open(stem, "r") as f:
        # populate data somehow

    csv_file = stem + ".csv"

    with open(csv_file, "w", encoding="utf-8") as f:
        w = csv.writer(f)

        for row in data:
            w.writerow(data)

if __name__ == "__main__":
    pool = multiprocessing.Pool()
    result = pool.map(read_and_write_single_file, get_list_of_files())

See the linked documentation for how to control the number of workers, tasks per worker, etc.

I may have found an answer myself. Not so sure if it is indeed a good answer, but the time is 6 times shorter than before.

def readwritevalue(file):
    with open(file, 'r', encoding='UTF-8') as fp:
        ##dataprocess
    file1=file+".csv"
    with open(file1,"w") as fp2:
        ##write data


if __name__=="__main__":
    pool=Pool(processes=int(mp.cpu_count()*0.7))
    pool.map(readwritevalue,[file for file in gettxtpath()])
    t1=time.time()
    pool.close()
    pool.join()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM