[英]using multiprocessing to compress large number of files
I am trying to compress around 95 files each of size 7 gigs using python multiprocessing module: 我正在尝试使用python多处理模块压缩大约7个演出的95个文件:
import os;
from shutil import copyfileobj;
import bz2;
import multiprocessing as mp
import pprint
from numpy.core.test_rational import numerator
''' Input / Output Path '''
ipath = 'E:/AutoConfirm/'
opath = 'E:/compressed-autoconfirm/'
''' Number of Processes '''
num_of_proc = 6
def compressFile(fileName,chunkSize=100000000):
global ipath
print 'Started Compressing %s to %s'%(fileName,opath)
inp = open(ipath+fileName,'rb')
output = bz2.BZ2File(opath+fileName.split('/')[-1].strip('.csv')+'.bz2','wb',compresslevel=9)
copyfileobj(inp,output,chunkSize)
print 'Finished Compressing %s to %s'%(fileName,opath)
def process_worker(fileList):
for x in fileList:
compressFile(x)
def split_list(tempList):
a , reList = 0, []
global num_of_proc
for x in range(num_of_proc+1):
reList.append([tempList[a:a+len(tempList)/num_of_proc]])
a = a + len(tempList)/num_of_proc
return reList
pool = mp.Pool(processes=num_of_proc)
''' Prepare a list of all the file names '''
tempList = [x for x in os.listdir(ipath)]
''' Split the list into sub-lists
For example : if I have 90 files and I am using 6 processes
each of the process will work on 15 files each '''
iterList = split_list(tempList)
''' print iterList >> [ [filename1, filename2] , [filename3,filename4], ... ] '''
''' Pass the list consisting of sub-lists to pool '''
pool.map(process_worker,iterList)
The above code ends up creating 90 processes instead of 6. Can anyone help me identify the defect in the code. 上面的代码最终创建了90个进程,而不是6个。有人可以帮助我确定代码中的缺陷吗?
Multiprocessing will re-import the module, so as everything is top level it does it all again, and again, and again. 多重处理将重新导入模块,因此,由于所有操作都是顶级操作,因此会一次又一次地执行所有操作。
You need to put the code in a function and call it. 您需要将代码放入函数中并调用它。
def main():
...
if __name__ == '__main__':
main()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.