Python多重处理：对象标识符在进程之间是唯一的

Question

Suppose you want to run several processes in parallel (using multiprocessing, possibly on multiple separate machines, as in a cluster), where each process creates a list of new instances of a particular class. 假设您要并行运行多个进程（使用多进程，可能在集群中的多个不同机器上使用多进程），其中每个进程都会创建一个特定类的新实例的列表。 Then, you send all these lists back to the parent process, and you want to combine them. 然后，将所有这些列表发送回父进程，并希望将它们合并。 Now, can we index these instances by their object id? 现在，我们可以通过它们的对象ID索引这些实例吗？ Can I expect the id's to uniquely identify the objects given that each object was generate on a separate process (possible a separate machine)? 如果每个对象都是在单独的进程（可能是单独的机器）上生成的，我是否可以期望ID唯一标识对象？

In other words, does the id of an object survive the pickling required to send data between processes, or does the interpreter assign a fresh and unique id to the objects when unpickling them? 换句话说，对象的ID是否可以在进程之间发送数据所需的酸洗中保留下来，还是当解开对象时解释器为对象分配新的唯一ID？

Answer 1

You asked, does the id of the object survive the pickling? 您问，对象的ID是否可以在酸洗中生存？ The answer is no. 答案是不。 The object is pickled and sent to another process, and a new object is created in that process with a new id. 将该对象腌制并发送到另一个进程，并在该进程中使用新的ID创建一个新的对象。 Results are sent back to the original process. 结果将发送回原始过程。 The id does not survive… they are different objects. id无法生存……它们是不同的对象。 Id doesn't often survive pickling even in the same process… try obj2 = pickle.loads(pickle.dumps(object)) and see if obj2 is object … it's often not the case. 即使在相同的过程中，Id也常常无法幸免于难……尝试obj2 = pickle.loads(pickle.dumps(object))并查看obj2 is object …情况通常并非如此。

>>> import dill
>>>     
>>> class A(object):
...   pass
... 
>>> b = A()
>>> 
>>> id(b)
4473714832
>>> id(dill.loads(dill.dumps(b)))  
4486366032
>>>

However, if you want to maintain an "id" as to understand which object is which, you can. 但是，如果您想维护一个“ id”以了解哪个对象是哪个对象，则可以。 Just add a id attribute that stores some id information (could be a simple number, such as the process "rank" (order), or could be something like a randomly generated hash, or something else… you pick). 只需添加一个存储一些ID信息的id属性即可（可以是一个简单的数字，例如流程“等级”（顺序），也可以是随机生成的哈希，或者您选择的其他东西）。 If you create this attribute ahead of time, and store the "id" there, it should maintain this information across a pickle . 如果您提前创建此属性，并将“ id”存储在此处，则它应在pickle维护此信息。 However, if you try to dynamically add an id attribute to any object, then pickle will "forget" that attribute was added, and the deserialized object will not have the attribute. 但是，如果尝试向任何对象动态添加id属性，则pickle将“忘记”已添加的属性，反序列化的对象将不具有该属性。 Alternately, if you use an "advanced" serializer like dill , you can pickle a dynamically added attribute on a class instance or almost any object. 或者，如果使用像dill这样的“高级”序列化程序，则可以在类实例或几乎任何对象上腌制动态添加的属性。

Python多重处理：对象标识符在进程之间是唯一的

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-10-10 22:20:10

Python多重处理：对象标识符在进程之间是唯一的

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-10-10 22:20:10

解决方案1
1 已采纳 2014-10-10 22:20:10