简体   繁体   中英

Executing the same function in parallel inside a for loop

Using Python 2.7, I have created a sample dictionary and a couple of functions to subset that dictionary and the iterate through the subsets...

from itertools import islice
from multiprocessing import Process
from collections import OrderedDict

global pair_dict

pair_dict = {
    1: 'one',
    2: 'two',
    3: 'three',
    4: 'four',
    5: 'five',
    6: 'six',
    7: 'seven',
    8: 'eight'
}


global test_printer



def test_printer(start_chunk, end_chunk):

    fin_dict = OrderedDict(sorted(pair_dict.items()))
    sub_dict = dict(fin_dict.items()[start_chunk:end_chunk])

    for key, value in sub_dict.iteritems():

        print key, value

    print '-' * 50



def set_chunk_start_end_points():


    # Takes the dictionary and chunks for parallel execution.

    for i in range(2, 9, 2):

        start_chunk = i - 2
        end_chunk = i

        test_printer(start_chunk, end_chunk)

        #first = Process(target=test_printer, args=(start_chunk, end_chunk)).start()

set_chunk_start_end_points()

...I have seen examples of multiprocessing usage, but none seem to fit what I am trying to do. The sample code creates four subset dictionaries and executes them in serial. I am looking for them to run in parallel.

If you comment out the line test_printer(start_chunk, end_chunk) and uncomment the one below it, I'm expecting to see the same output, just that Python used multiple threads to do it. However, now nothing happens.

What am I doing wrong?

Thanks

I always find pool.map to be the easiest way to perform the same function in parallel. Maybe you'll find it helpful.

from itertools import islice
from multiprocessing import Pool as ProcessPool # easier to work with for this sort of thing
from collections import OrderedDict

# You were using globals wrong. But that's a separate topic.

pair_dict = {
    1: 'one',
    2: 'two',
    3: 'three',
    4: 'four',
    5: 'five',
    6: 'six',
    7: 'seven',
    8: 'eight'
}

# This only needs to be executed once. Not every time the function is called.
fin_dict = OrderedDict(sorted(pair_dict.items()))


def test_printer(chunk): # Going to make this take 1 argument. Just easier.
    start_chunk = chunk[0]
    end_chunk = chunk[1]  # All things considered this should be called chunk_end, not end_chunk

    # list for python3 compatibility
    sub_dict = dict(list(fin_dict.items())[start_chunk:end_chunk])


    # .items() for python3 compatibility
    for key, value in sub_dict.items():
        print(key, value) # Looks like you're still using Python2.7? Upgrade friend. Little support for that stuff anymore.

    print('-' * 50)



def set_chunk_start_end_points():
    # Takes the dictionary and chunks for parallel execution.
    # comment: Does it? This function takes no arguments from what I can see.
    # Think through your comments carefully.

    # Let's calculate the chunks upfront:
    chunks = [(i-2, i) for i in range(2,9,2)]

    with ProcessPool(4) as pool: # however many processes you want
        pool.map(test_printer, chunks)

set_chunk_start_end_points()

Note, unless you want specific chunks, pool.map will chunk it for you. In this case, it is actually chunking our list of chunks!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM