Python：多处理和c_char_p数组

Question

I'm launching 3 processes and I want them to put a string into a shared array, at the index corresponding to the process (i). 我正在启动3个进程，我希望它们将一个字符串放入共享数组中， 位于与进程（i）相对应的索引处。

Look at the code below, the output generated is: 看下面的代码，生成的输出是：

['test 0', None, None]
['test 1', 'test 1', None]
['test 2', 'test 2', 'test 2']

Why 'test 0' get overwritten by test 1 , and test 1 by test 2 ? 为什么'test 0'被test 1覆盖，而test 1被test 2覆盖？

What I want is (order is not important) : 我想要的是（顺序并不重要）：

['test 0', None, None]
['test 0', 'test 1', None]
['test 0', 'test 1', 'test 2']

The code : 代码：

#!/usr/bin/env python

import multiprocessing
from multiprocessing import Value, Lock, Process, Array
import ctypes
from ctypes import c_int, c_char_p

class Consumer(multiprocessing.Process):
    def __init__(self, task_queue, result_queue, arr, lock):
            multiprocessing.Process.__init__(self)
            self.task_queue = task_queue
            self.result_queue = result_queue
            self.arr = arr
            self.lock = lock

    def run(self):
            proc_name = self.name
            while True:
                next_task = self.task_queue.get()
                if next_task is None:
                    self.task_queue.task_done()
                    break            
                answer = next_task(arr=self.arr, lock=self.lock)
                self.task_queue.task_done()
                self.result_queue.put(answer)
            return

class Task(object):
    def __init__(self, i):
        self.i = i

    def __call__(self, arr=None, lock=None):
        with lock:
            arr[self.i] = "test %d" % self.i
            print arr[:]

    def __str__(self):
        return 'ARC'

    def run(self):
        print 'IN'

if __name__ == '__main__':
   tasks = multiprocessing.JoinableQueue()
   results = multiprocessing.Queue()

   arr = Array(ctypes.c_char_p, 3)

   lock = multiprocessing.Lock()

   num_consumers = multiprocessing.cpu_count() * 2
   consumers = [Consumer(tasks, results, arr, lock) for i in xrange(num_consumers)]

   for w in consumers:
      w.start()

   for i in xrange(3):
      tasks.put(Task(i))

   for i in xrange(num_consumers):
      tasks.put(None)

I'm running Python 2.7.3 (Ubuntu) 我正在运行Python 2.7.3（Ubuntu）

Answer 1

This problem seems similar to this one . 这个问题似乎与此相似。 There, JF Sebastian speculated that the assignment to arr[i] points arr[i] to a memory address that was only meaningful to the subprocess making the assignment. 在那里，JF塞巴斯蒂安推测分配arr[i]点arr[i]到这只是有意义在进行分配的一个子进程的内存地址。 The other subprocesses retrieve garbage when looking at that address. 其他子进程在查看该地址时会检索垃圾。

There are at least two ways to avoid this problem. 至少有两种方法可以避免此问题。 One is to use a multiprocessing.manager list: 一种是使用multiprocessing.manager列表：

import multiprocessing as mp

class Consumer(mp.Process):
    def __init__(self, task_queue, result_queue, lock, lst):
            mp.Process.__init__(self)
            self.task_queue = task_queue
            self.result_queue = result_queue
            self.lock = lock
            self.lst = lst

    def run(self):
            proc_name = self.name
            while True:
                next_task = self.task_queue.get()
                if next_task is None:
                    self.task_queue.task_done()
                    break            
                answer = next_task(lock = self.lock, lst = self.lst)
                self.task_queue.task_done()
                self.result_queue.put(answer)
            return

class Task(object):
    def __init__(self, i):
        self.i = i

    def __call__(self, lock, lst):
        with lock:
            lst[self.i] = "test {}".format(self.i)
            print([lst[i] for i in range(3)])

if __name__ == '__main__':
   tasks = mp.JoinableQueue()
   results = mp.Queue()
   manager = mp.Manager()
   lst = manager.list(['']*3)

   lock = mp.Lock()
   num_consumers = mp.cpu_count() * 2
   consumers = [Consumer(tasks, results, lock, lst) for i in xrange(num_consumers)]

   for w in consumers:
      w.start()

   for i in xrange(3):
      tasks.put(Task(i))

   for i in xrange(num_consumers):
      tasks.put(None)

   tasks.join()

Another way is to use a shared array with a fixed size such as mp.Array('c', 10) . 另一种方法是使用大小固定的共享数组，例如mp.Array('c', 10) 。

import multiprocessing as mp

class Consumer(mp.Process):
    def __init__(self, task_queue, result_queue, arr, lock):
            mp.Process.__init__(self)
            self.task_queue = task_queue
            self.result_queue = result_queue
            self.arr = arr
            self.lock = lock

    def run(self):
            proc_name = self.name
            while True:
                next_task = self.task_queue.get()
                if next_task is None:
                    self.task_queue.task_done()
                    break            
                answer = next_task(arr = self.arr, lock = self.lock)
                self.task_queue.task_done()
                self.result_queue.put(answer)
            return

class Task(object):
    def __init__(self, i):
        self.i = i

    def __call__(self, arr, lock):
        with lock:
            arr[self.i].value = "test {}".format(self.i)
            print([a.value for a in arr])

if __name__ == '__main__':
   tasks = mp.JoinableQueue()
   results = mp.Queue()
   arr = [mp.Array('c', 10) for i in range(3)]

   lock = mp.Lock()
   num_consumers = mp.cpu_count() * 2
   consumers = [Consumer(tasks, results, arr, lock) for i in xrange(num_consumers)]

   for w in consumers:
      w.start()

   for i in xrange(3):
      tasks.put(Task(i))

   for i in xrange(num_consumers):
      tasks.put(None)

   tasks.join()

I speculate that the reason why this works when mp.Array(ctypes.c_char_p, 3) does not, is because mp.Array('c', 10) has a fixed size so the memory address never changes, while mp.Array(ctypes.c_char_p, 3) has a variable size, so the memory address might change when arr[i] is assigned to a bigger string. 我推测mp.Array(ctypes.c_char_p, 3)不起作用的原因是因为mp.Array('c', 10)具有固定大小，因此内存地址永远不会改变，而mp.Array(ctypes.c_char_p, 3)具有可变大小，因此当将arr[i]分配给较大的字符串时，内存地址可能会更改。

Perhaps this is what the docs are warning about when it states, 也许这就是文档警告的状态，

Although it is possible to store a pointer in shared memory remember that this will refer to a location in the address space of a specific process. 尽管可以将指针存储在共享内存中，但请记住，这将指向特定进程的地址空间中的位置。 However, the pointer is quite likely to be invalid in the context of a second process and trying to dereference the pointer from the second process may cause a crash. 但是，该指针很可能在第二个进程的上下文中无效，并且尝试从第二个进程取消引用该指针可能会导致崩溃。

Python：多处理和c_char_p数组

问题描述

1 个解决方案

解决方案1
5 已采纳 2013-01-08 19:46:14

Python：多处理和c_char_p数组

问题描述

1 个解决方案

解决方案1 5 已采纳 2013-01-08 19:46:14

解决方案1
5 已采纳 2013-01-08 19:46:14