简体   繁体   中英

Python, image compression and multiprocessing

I'm trying to wrap my head round MultiProcessing in Python, but I simply can't. Notice that I was, am and probably forever be a noob in everything-programming. Ah, anyways. Here it goes.

I'm writing a Python script that compresses images downloaded to a folder with ImageMagick, using predefined variables from the user, stored in an ini file. The script searches for folders matching a pattern in a download dir, checks if they contain JPGs, PNGs or other image files and, if yes, recompresses and renames them, storing the results in a "compressed" folder.

Now, here's the thing: I'd love it if I was able to "parallelize" the whole compression thingy, but... I can't understand how I'm supposed to do that.

I don't want to tire you with the existing code since it simply sucks. It's just a simple "for file in directory" loop. THAT's what I'd love to parallelize - could somebody give me an example on how multiprocessing could be used with files in a directory?

I mean, let's take this simple piece of code:

for f in matching_directory: print ('I\\'m going to process file:', f)

For those that DO have to peek at the code, here's the part where I guess the whole parallelization bit will stick:

for f in ImageFolders:
    print (splitter)
    print (f)
    print (splitter)
    PureName = CleanName(f)
    print (PureName)
    for root, dirs, files in os.walk(f):
        padding = int(round( math.log( len(files), 10))) + 1
        padding = max(minpadding, padding)
        filecounter = 0
        for filename in files:
            if filename.endswith(('.jpg', '.jpeg', '.gif', '.png')):
                filecounter += 1
                imagefile, ext = os.path.splitext(filename)
                newfilename = "%s_%s%s" % (PureName, (str(filecounter).rjust(padding,'0')), '.jpg')
                startfilename = os.path.join (f, filename)
                finalfilename = os.path.join(Dir_Images_To_Publish, PureName, newfilename)
                print (filecounter, ':', startfilename, ' >>> ', finalfilename)
                Original_Image_FileList.append(startfilename)
                Processed_Image_FileList.append(finalfilename)

...and here I'd like to be able to add a piece of code where a worker takes the first file from Original_Image_FileList and compresses it to the first filename from Processed_Image_FileList, a second one takes the one after that, blah-blah, up to a specific number of workers - depending on a user setting in the ini file.

Any ideas?

You can create a pool of workers using the Pool class, to which you can distribute the image compression to. See the Using a pool of workers section of the multiprocessing documentation.

If your compression function is called compress(filename) , for example, then you can use the Pool.map method to apply this function to an iterable that returns the filenames, ie your list matching_directory :

from multiprocessing import Pool

def compress_image(image):
  """Define how you'd like to compress `image`..."""
  pass

def distribute_compression(images, pool_size = 4):
  with Pool(processes=pool_size) as pool:
    pool.map(compress_image, images)

There's a variety of map-like methods available, see map for starters. You may like to experiment with the pool size, to see what works best.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM