简体   繁体   English

Python如何使用多重处理读取和写入不同的文件

[英]Python how to read from and write to different files using multiprocessing

I have several files and I would like to read those files, filter some keywords and write them into different files. 我有几个文件,我想读取这些文件,过滤一些关键字并将它们写入不同的文件。 I use Process() and it turns out that it takes more time to process the readwrite function. 我使用Process(),结果证明需要更多时间来处理读写函数。 Do I need to separate the read and write to two functions? 我需要将读写功能分开吗? How I can read multiple files at one time and write key words in different files to different csv? 如何一次读取多个文件并将不同文件中的关键字写入不同的csv?

Thank you very much. 非常感谢你。

def readwritevalue():
    for file in gettxtpath():    ##gettxtpath will return a list of files
        file1=file+".csv"
        ##Identify some variable
##Read the file
        with open(file) as fp:
            for line in fp:
                #Process the data
                data1=xxx
                data2=xxx
                ....
         ##Write it to different files
        with open(file1,"w") as fp1
            print(data1,file=fp1 )
            w = csv.writer(fp1)
            writer.writerow(data2)
            ...
if __name__ == '__main__':
    p = Process(target=readwritevalue)
    t1 = time.time()
    p.start()
    p.join()

Want to edit my questions. 想要编辑我的问题。 I have more functions to modify the csv generated by the readwritevalue() functions. 我有更多函数可以修改readwritevalue()函数生成的csv。 So, if Pool.map() is fine. 因此,如果Pool.map()很好。 Will it be ok to change all the remaining functions like this? 可以像这样更改所有剩余功能吗? However, it seems that it did not save much time for that. 但是,这似乎并没有节省太多时间。

def getFormated(file):  ##Merge each csv with a well-defined formatted csv and generate a final report with writing all the csv to one output csv

   csvMerge('Format.csv',file,file1)
   getResult()

if __name__=="__main__":
    pool=Pool(2)
    pool.map(readwritevalue,[file for file in gettxtpath()])
    pool.map(GetFormated,[file for file in getcsvName()])
    pool.map(Otherfunction,file_list)
    t1=time.time()
    pool.close()
    pool.join()

You can extract the body of the for loop into its own function, create a multiprocessing.Pool object , then call pool.map() like so (I've used more descriptive names): 您可以将for循环的主体提取到其自己的函数中,创建一个multiprocessing.Pool对象 ,然后像这样调用pool.map() (我使用了更具描述性的名称):

import csv
import multiprocessing

def read_and_write_single_file(stem):
    data = None

    with open(stem, "r") as f:
        # populate data somehow

    csv_file = stem + ".csv"

    with open(csv_file, "w", encoding="utf-8") as f:
        w = csv.writer(f)

        for row in data:
            w.writerow(data)

if __name__ == "__main__":
    pool = multiprocessing.Pool()
    result = pool.map(read_and_write_single_file, get_list_of_files())

See the linked documentation for how to control the number of workers, tasks per worker, etc. 有关如何控制工作人员数量,每个工作人员的任务等的信息,请参阅链接的文档。

I may have found an answer myself. 我可能自己也找到了答案。 Not so sure if it is indeed a good answer, but the time is 6 times shorter than before. 不确定是否确实是一个好答案,但是时间比以前缩短了6倍。

def readwritevalue(file):
    with open(file, 'r', encoding='UTF-8') as fp:
        ##dataprocess
    file1=file+".csv"
    with open(file1,"w") as fp2:
        ##write data


if __name__=="__main__":
    pool=Pool(processes=int(mp.cpu_count()*0.7))
    pool.map(readwritevalue,[file for file in gettxtpath()])
    t1=time.time()
    pool.close()
    pool.join()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM