Executing the same function in parallel inside a for loop

Question

Using Python 2.7, I have created a sample dictionary and a couple of functions to subset that dictionary and the iterate through the subsets...

from itertools import islice
from multiprocessing import Process
from collections import OrderedDict

global pair_dict

pair_dict = {
    1: 'one',
    2: 'two',
    3: 'three',
    4: 'four',
    5: 'five',
    6: 'six',
    7: 'seven',
    8: 'eight'
}


global test_printer



def test_printer(start_chunk, end_chunk):

    fin_dict = OrderedDict(sorted(pair_dict.items()))
    sub_dict = dict(fin_dict.items()[start_chunk:end_chunk])

    for key, value in sub_dict.iteritems():

        print key, value

    print '-' * 50



def set_chunk_start_end_points():


    # Takes the dictionary and chunks for parallel execution.

    for i in range(2, 9, 2):

        start_chunk = i - 2
        end_chunk = i

        test_printer(start_chunk, end_chunk)

        #first = Process(target=test_printer, args=(start_chunk, end_chunk)).start()

set_chunk_start_end_points()

...I have seen examples of multiprocessing usage, but none seem to fit what I am trying to do. The sample code creates four subset dictionaries and executes them in serial. I am looking for them to run in parallel.

If you comment out the line test_printer(start_chunk, end_chunk) and uncomment the one below it, I'm expecting to see the same output, just that Python used multiple threads to do it. However, now nothing happens.

What am I doing wrong?

Thanks

Answer 1

I always find pool.map to be the easiest way to perform the same function in parallel. Maybe you'll find it helpful.

from itertools import islice
from multiprocessing import Pool as ProcessPool # easier to work with for this sort of thing
from collections import OrderedDict

# You were using globals wrong. But that's a separate topic.

pair_dict = {
    1: 'one',
    2: 'two',
    3: 'three',
    4: 'four',
    5: 'five',
    6: 'six',
    7: 'seven',
    8: 'eight'
}

# This only needs to be executed once. Not every time the function is called.
fin_dict = OrderedDict(sorted(pair_dict.items()))


def test_printer(chunk): # Going to make this take 1 argument. Just easier.
    start_chunk = chunk[0]
    end_chunk = chunk[1]  # All things considered this should be called chunk_end, not end_chunk

    # list for python3 compatibility
    sub_dict = dict(list(fin_dict.items())[start_chunk:end_chunk])


    # .items() for python3 compatibility
    for key, value in sub_dict.items():
        print(key, value) # Looks like you're still using Python2.7? Upgrade friend. Little support for that stuff anymore.

    print('-' * 50)



def set_chunk_start_end_points():
    # Takes the dictionary and chunks for parallel execution.
    # comment: Does it? This function takes no arguments from what I can see.
    # Think through your comments carefully.

    # Let's calculate the chunks upfront:
    chunks = [(i-2, i) for i in range(2,9,2)]

    with ProcessPool(4) as pool: # however many processes you want
        pool.map(test_printer, chunks)

set_chunk_start_end_points()

Note, unless you want specific chunks, pool.map will chunk it for you. In this case, it is actually chunking our list of chunks!

Executing the same function in parallel inside a for loop

Question

1 answers

solution1
0 2020-04-07 15:00:06

Executing the same function in parallel inside a for loop

Question

1 answers

solution1 0 2020-04-07 15:00:06

solution1
0 2020-04-07 15:00:06