简体   繁体   中英

Tuple for multiprocessing.Array in python

I'm struggling with multiprocessing in python. I want to put list of tuple in multiprocessing.Array, but I can't find the typecode for tuple.

This is the code, and I want to know how to write "type_of_tuple" for arr in main function.

from multiprocessing import Pool, Array

def thread_func(time, array):
    time.sleep(time)
    if len(array) > 0:
        print(array.pop(0))

def main(cpu_number):
    list = [("a","b"), ("c","d"), ("e","f")]
    arr = Array( type_of_tuple """ how to write this?""", list)

    for i in range(cpu_number):
        r = pool.apply_async(thread_func, args=(1000, arr))
        thread_list.append(r)

    for thread in thread_list:
        thread.wait()

if __name__ == "__main__":
    main(3)

The reason you can't find it is because it doesn't exist. The whole point of Array is that it handles arrays of simple, homogenous types that can be stored as "unboxed" binary data.

A tuple is a compound type, which can hold any number of values of any kind. So you can't put it in an Array .

In fact, you can't put strings in arrays either, because strings have a variable number of characters; each one is a different size. (And, if this is Python 3, it's even worse than that, because characters can be 1, 2, or 4 bytes…)

On top of that, an array has a fixed length; you can't pop values off it anyway.

So, you will need to find a different way to share this data.

You could use shared_ctypes if you understand C well enough to map your tuple of strings to a struct of char* .

Or you could write a function to encode the tuples into fixed-size values (which you then slice into an Array of characters) on one side and decode them on the other.

But I suspect you'll find life a lot simpler if you do what the docs recommend and find a way to write your code in terms of message passing instead of shared memory.

Since the only shared mutation you need to here is to have each job pop a value off the end so that other jobs won't see the same value, the obvious answer is to use a Queue , because that's exactly what it does.

Or, even simpler, just use one of the higher-level methods like map instead of apply , to take care of managing the queue and making sure each job gets exactly one value, so you don't even have to think about it. For example:

def thread_func(time, value):
    time.sleep(time)
    print(value)

def main(cpu_number):
    values = [("a","b"), ("c","d"), ("e","f")]
    results = pool.imap_unordered(partial(thread_func, 1000), values[:cpu_number])
    for result in results:
        pass

if __name__ == "__main__":
    main(3)

(As a side note, I'm not sure why you're restricting the number of tasks to the number of CPUs. Normally, you create a Pool(cpu_number) and just queue up all of the tasks on that. If you only want to run exactly 3 tasks, you don't even really need a pool for that, just run each one on a Process .)

To answer your question, without caring about the logic, you should use a ctypes struct instead:

from ctypes import Structure, POINTER, byref, c_ubyte, cast, create_string_buffer
from multiprocessing import Pool, Array


class SomeTuple(Structure):
    _fields_ = [
        ('a', POINTER(c_ubyte)),
        ('b', POINTER(c_ubyte))
    ]


def thread_func(time, array_):
    time.sleep(time)
    if len(array_) > 0:
        print(array_.pop(0))

def main(cpu_number):
    thread_list = []
    value_list = [
        SomeTuple(cast(create_string_buffer(b"a"), POINTER(c_ubyte)), cast(create_string_buffer(b"b"), POINTER(c_ubyte))),
        SomeTuple(cast(create_string_buffer(b"c"), POINTER(c_ubyte)), cast(create_string_buffer(b"d"), POINTER(c_ubyte))),
        SomeTuple(cast(create_string_buffer(b"e"), POINTER(c_ubyte)), cast(create_string_buffer(b"f"), POINTER(c_ubyte)))
    ]
    arr = Array(SomeTuple, value_list)

    with Pool(processes=cpu_number) as pool:
        for _ in range(cpu_number):
            r = pool.apply_async(thread_func, args=(1000, arr))
            thread_list.append(r)

    for thread in thread_list:
        thread.wait()

if __name__ == "__main__":
    main(3)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM