[英]Python how to read from and write to different files using multiprocessing
I have several files and I would like to read those files, filter some keywords and write them into different files. 我有几个文件,我想读取这些文件,过滤一些关键字并将它们写入不同的文件。 I use Process() and it turns out that it takes more time to process the readwrite function.
我使用Process(),结果证明需要更多时间来处理读写函数。 Do I need to separate the read and write to two functions?
我需要将读写功能分开吗? How I can read multiple files at one time and write key words in different files to different csv?
如何一次读取多个文件并将不同文件中的关键字写入不同的csv?
Thank you very much. 非常感谢你。
def readwritevalue():
for file in gettxtpath(): ##gettxtpath will return a list of files
file1=file+".csv"
##Identify some variable
##Read the file
with open(file) as fp:
for line in fp:
#Process the data
data1=xxx
data2=xxx
....
##Write it to different files
with open(file1,"w") as fp1
print(data1,file=fp1 )
w = csv.writer(fp1)
writer.writerow(data2)
...
if __name__ == '__main__':
p = Process(target=readwritevalue)
t1 = time.time()
p.start()
p.join()
Want to edit my questions. 想要编辑我的问题。 I have more functions to modify the csv generated by the readwritevalue() functions.
我有更多函数可以修改readwritevalue()函数生成的csv。 So, if Pool.map() is fine.
因此,如果Pool.map()很好。 Will it be ok to change all the remaining functions like this?
可以像这样更改所有剩余功能吗? However, it seems that it did not save much time for that.
但是,这似乎并没有节省太多时间。
def getFormated(file): ##Merge each csv with a well-defined formatted csv and generate a final report with writing all the csv to one output csv
csvMerge('Format.csv',file,file1)
getResult()
if __name__=="__main__":
pool=Pool(2)
pool.map(readwritevalue,[file for file in gettxtpath()])
pool.map(GetFormated,[file for file in getcsvName()])
pool.map(Otherfunction,file_list)
t1=time.time()
pool.close()
pool.join()
You can extract the body of the for
loop into its own function, create a multiprocessing.Pool
object , then call pool.map()
like so (I've used more descriptive names): 您可以将
for
循环的主体提取到其自己的函数中,创建一个multiprocessing.Pool
对象 ,然后像这样调用pool.map()
(我使用了更具描述性的名称):
import csv
import multiprocessing
def read_and_write_single_file(stem):
data = None
with open(stem, "r") as f:
# populate data somehow
csv_file = stem + ".csv"
with open(csv_file, "w", encoding="utf-8") as f:
w = csv.writer(f)
for row in data:
w.writerow(data)
if __name__ == "__main__":
pool = multiprocessing.Pool()
result = pool.map(read_and_write_single_file, get_list_of_files())
See the linked documentation for how to control the number of workers, tasks per worker, etc. 有关如何控制工作人员数量,每个工作人员的任务等的信息,请参阅链接的文档。
I may have found an answer myself. 我可能自己也找到了答案。 Not so sure if it is indeed a good answer, but the time is 6 times shorter than before.
不确定是否确实是一个好答案,但是时间比以前缩短了6倍。
def readwritevalue(file):
with open(file, 'r', encoding='UTF-8') as fp:
##dataprocess
file1=file+".csv"
with open(file1,"w") as fp2:
##write data
if __name__=="__main__":
pool=Pool(processes=int(mp.cpu_count()*0.7))
pool.map(readwritevalue,[file for file in gettxtpath()])
t1=time.time()
pool.close()
pool.join()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.