简体   繁体   English

python多处理池将对象分配给worker

[英]python multiprocessing pool assign object to worker

I have some objects need to be processed. 我有一些对象需要处理。 I wonder if there is way to assign work (process) to object based on unique key. 我想知道是否有办法根据唯一键将工作(进程)分配给对象。
The first time when the code see object it should be randomly assigned a worker, but if the object appear again it should be assigned to the worker which processes the object before. 第一次当代码看到对象时,它应该被随机分配一个worker,但是如果该对象再次出现,它应该被分配给之前处理该对象的worker。 Thank you 谢谢

for example: 例如:
worker A,B,C | 工人A,B,C | first bunch objects 1,2,3,4 second bunch objects 1,3 第一束对象1,2,3,4第二束对象1,3
first bunch objects: 第一批对象:
worker A <--- 1,3 工人A <--- 1,3
worker B <--- 2 工人B <--- 2
worker C <--- 4 工人C <--- 4
second bunch objects: 第二束对象:
worker A <--- 1,3 工人A <--- 1,3
worker B <--- 工人B <---
worker C <--- 工人C <---

A very simple way to implement "sticky sessions" is to make your own version of multiprocessing.Pool which doesn't eagerly assign work items, but assigns them deterministically. 实现“粘性会话”的一种非常简单的方法是制作您自己的multiprocessing.Pool版本,它不会急切地分配工作项,而是确定性地分配它们。 Here's an incomplete but runnable solution: 这是一个不完整但可运行的解决方案:

import multiprocessing
import os
import time

def work(job):
    time.sleep(1)
    print "I am process", os.getpid(), "processing job", job

class StickyPool:
    def __init__(self, processes):
        self._inqueues = [multiprocessing.Queue() for ii in range(processes)]
        self._pool = [multiprocessing.Process(target=self._run, args=(self._inqueues[ii],)) for ii in range(processes)]
        for process in self._pool:
            process.start()

    def map(self, fn, args):
        for arg in args:
            ii = hash(arg) % len(self._inqueues)
            self._inqueues[ii].put((fn, arg))

    def _run(self, queue):
        while True:
            fn, arg = queue.get()
            fn(arg)

pool = StickyPool(3)
#pool = multiprocessing.Pool(3)                                                                                         

pool.map(work, [1,2,3,4,1,2,3,4,1,2,3,4])
time.sleep(4)

When using the above StickyPool , jobs are assigned based on the hash of their arguments. 使用上面的StickyPool ,会根据参数的哈希值分配作业。 This means the same arguments go to the same process every time. 这意味着每次都使用相同的参数进行相同的处理。 It's not smart enough to evenly distribute jobs if there are many unique values whose hashes collide, but oh well--room for future improvement. 如果有许多独特的值,其中的哈希值会发生冲突,那么均匀分配作业是不够智能的,但是哦 - 很好 - 未来改进的空间。 I also didn't bother with shutdown logic, so the program doesn't stop running if you use StickyPool but it does if you use multiprocessing.Pool . 我也没有打扰关闭逻辑,所以如果你使用StickyPool程序不会停止运行,但如果你使用multiprocessing.PoolStickyPool Fixing those issues and implementing more of the Pool interface (like apply() , and returning results from map() ) is left as an exercise. 修复这些问题并实现更多的Pool接口(比如apply() ,并从map()返回结果)留作练习。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM