简体   繁体   English

使用多重处理压缩大量文件

[英]using multiprocessing to compress large number of files

I am trying to compress around 95 files each of size 7 gigs using python multiprocessing module: 我正在尝试使用python多处理模块压缩大约7个演出的95个文件:

import os;
from shutil import copyfileobj;
import bz2;
import multiprocessing as mp
import pprint
from numpy.core.test_rational import numerator

''' Input / Output Path '''

ipath = 'E:/AutoConfirm/'
opath = 'E:/compressed-autoconfirm/'

''' Number of Processes '''
num_of_proc = 6

def compressFile(fileName,chunkSize=100000000):
    global ipath
    print 'Started Compressing %s to %s'%(fileName,opath)
    inp = open(ipath+fileName,'rb')
    output = bz2.BZ2File(opath+fileName.split('/')[-1].strip('.csv')+'.bz2','wb',compresslevel=9)
    copyfileobj(inp,output,chunkSize)
    print 'Finished Compressing %s to %s'%(fileName,opath)

def process_worker(fileList):
    for x in fileList:
        compressFile(x)

def split_list(tempList):
    a , reList = 0, []
    global num_of_proc
    for x in range(num_of_proc+1):
        reList.append([tempList[a:a+len(tempList)/num_of_proc]])
        a = a + len(tempList)/num_of_proc
    return reList

pool = mp.Pool(processes=num_of_proc)
''' Prepare a list of all the file names '''
tempList = [x for x in os.listdir(ipath)]

''' Split the list into sub-lists 
    For example : if I have 90 files and I am using 6 processes 
                  each of the process will work on 15 files each '''

iterList = split_list(tempList)

''' print iterList >> [ [filename1, filename2] , [filename3,filename4], ... ] '''    


''' Pass the list consisting of sub-lists to pool '''
pool.map(process_worker,iterList)

The above code ends up creating 90 processes instead of 6. Can anyone help me identify the defect in the code. 上面的代码最终创建了90个进程,而不是6个。有人可以帮助我确定代码中的缺陷吗?

Multiprocessing will re-import the module, so as everything is top level it does it all again, and again, and again. 多重处理将重新导入模块,因此,由于所有操作都是顶级操作,因此会一次又一次地执行所有操作。

You need to put the code in a function and call it. 您需要将代码放入函数中并调用它。

def main():
    ...

if __name__ == '__main__':
    main()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM