Python，图像压缩和多处理

Question

I'm trying to wrap my head round MultiProcessing in Python, but I simply can't. 我试图用Python来处理MultiProcessing，但是我做不到。 Notice that I was, am and probably forever be a noob in everything-programming. 请注意，在一切编程中，我曾经是，现在可能永远是菜鸟。 Ah, anyways. 啊，反正。 Here it goes. 来了

I'm writing a Python script that compresses images downloaded to a folder with ImageMagick, using predefined variables from the user, stored in an ini file. 我正在编写一个Python脚本，使用来自用户的预定义变量将存储在ini文件中的图像压缩到使用ImageMagick下载到文件夹的图像。 The script searches for folders matching a pattern in a download dir, checks if they contain JPGs, PNGs or other image files and, if yes, recompresses and renames them, storing the results in a "compressed" folder. 该脚本在下载目录中搜索与模式匹配的文件夹，检查它们是否包含JPG，PNG或其他图像文件，如果是，则重新压缩并重命名它们，并将结果存储在“压缩”文件夹中。

Now, here's the thing: I'd love it if I was able to "parallelize" the whole compression thingy, but... I can't understand how I'm supposed to do that. 现在，事情来了：如果我能够“并行化”整个压缩过程，我会很喜欢的，但是...我不明白我该怎么做。

I don't want to tire you with the existing code since it simply sucks. 我不想对现有代码感到厌倦，因为它很烂。 It's just a simple "for file in directory" loop. 这只是一个简单的“目录中的文件”循环。 THAT's what I'd love to parallelize - could somebody give me an example on how multiprocessing could be used with files in a directory? 那就是我想要并行化的东西-有人可以举一个例子说明如何对目录中的文件使用多处理吗？

I mean, let's take this simple piece of code: 我的意思是，让我们看一下这段简单的代码：

for f in matching_directory: print ('I\\'m going to process file:', f) 对于matching_directory中的f：打印（“我要处理文件：”，f）

For those that DO have to peek at the code, here's the part where I guess the whole parallelization bit will stick: 对于那些必须偷看代码的人，这是我想整个并行化位都将保留的部分：

for f in ImageFolders:
    print (splitter)
    print (f)
    print (splitter)
    PureName = CleanName(f)
    print (PureName)
    for root, dirs, files in os.walk(f):
        padding = int(round( math.log( len(files), 10))) + 1
        padding = max(minpadding, padding)
        filecounter = 0
        for filename in files:
            if filename.endswith(('.jpg', '.jpeg', '.gif', '.png')):
                filecounter += 1
                imagefile, ext = os.path.splitext(filename)
                newfilename = "%s_%s%s" % (PureName, (str(filecounter).rjust(padding,'0')), '.jpg')
                startfilename = os.path.join (f, filename)
                finalfilename = os.path.join(Dir_Images_To_Publish, PureName, newfilename)
                print (filecounter, ':', startfilename, ' >>> ', finalfilename)
                Original_Image_FileList.append(startfilename)
                Processed_Image_FileList.append(finalfilename)

...and here I'd like to be able to add a piece of code where a worker takes the first file from Original_Image_FileList and compresses it to the first filename from Processed_Image_FileList, a second one takes the one after that, blah-blah, up to a specific number of workers - depending on a user setting in the ini file. ...在这里，我希望能够添加一段代码，其中工作人员从Original_Image_FileList中获取第一个文件，然后将其压缩为Processed_Image_FileList中的第一个文件名，第二个之后再获取一个文件，等等。最多可以有特定数量的工作程序-取决于ini文件中的用户设置。

Any ideas? 有任何想法吗？

Answer 1

You can create a pool of workers using the Pool class, to which you can distribute the image compression to. 您可以使用Pool类创建一个工作池，您可以将图像压缩分发到该工作Pool 。 See the Using a pool of workers section of the multiprocessing documentation. 请参阅multiprocessing文档的“ 使用工作人员池”部分。

If your compression function is called compress(filename) , for example, then you can use the Pool.map method to apply this function to an iterable that returns the filenames, ie your list matching_directory : 例如，如果您的压缩函数称为compress(filename) ，则可以使用Pool.map方法将此函数应用于返回文件名的可迭代对象，即您的列表matching_directory ：

from multiprocessing import Pool

def compress_image(image):
  """Define how you'd like to compress `image`..."""
  pass

def distribute_compression(images, pool_size = 4):
  with Pool(processes=pool_size) as pool:
    pool.map(compress_image, images)

There's a variety of map-like methods available, see map for starters. 有许多类似地图的方法可用，请参阅地图以了解入门。 You may like to experiment with the pool size, to see what works best. 您可能想试验一下池的大小，以了解最有效的方法。

Python，图像压缩和多处理

问题描述

1 个解决方案

解决方案1
1 2015-03-19 03:16:50

Python，图像压缩和多处理

问题描述

1 个解决方案

解决方案1 1 2015-03-19 03:16:50

解决方案1
1 2015-03-19 03:16:50