无法进行多处理以同时运行进程

Question

下面的代码似乎并不同时运行，我不确定为什么：

def run_normalizers(config, debug, num_threads, name=None):

    def _run():
        print('Started process for normalizer')
        sqla_engine = init_sqla_from_config(config)
        image_vfs = create_s3vfs_from_config(config, config.AWS_S3_IMAGE_BUCKET)
        storage_vfs = create_s3vfs_from_config(config, config.AWS_S3_STORAGE_BUCKET)

        pp = PipedPiper(config, image_vfs, storage_vfs, debug=debug)

        if name:
            pp.run_pipeline_normalizers(name)
        else:
            pp.run_all_normalizers()
        print('Normalizer process complete')

    threads = []
    for i in range(num_threads):
        threads.append(multiprocessing.Process(target=_run))
    [t.start() for t in threads]
    [t.join() for t in threads]


run_normalizers(...)

config变量只是在_run()函数之外定义的字典。 似乎创建了所有进程 - 但它并不比使用单个进程更快地完成。 基本上在run_**_normalizers()函数中发生的事情是从数据库中的队列表（SQLAlchemy）读取，然后发出一些HTTP请求，然后运行规范化器的“管道”来修改数据，然后将其保存回来数据库。 我来自JVM领域，其中线程“很重”并经常用于并行 - 我有点困惑，因为我认为多进程模块应该绕过Python的GIL的限制。

Answer 1

修复了我的多处理问题 - 并实际切换了线程。 不确定实际修复它的想法 - 我只是重新构建了所有内容，并使工作人员和任务变得更好，而现在正在飞行中。 以下是我所做的基础知识：

import abc
from Queue import Empty, Queue
from threading import Thread

class AbstractTask(object):
    """
        The base task
    """
    __metaclass__ = abc.ABCMeta

    @abc.abstractmethod
    def run_task(self):
        pass

class TaskRunner(object):

    def __init__(self, queue_size, num_threads=1, stop_on_exception=False):
        super(TaskRunner, self).__init__()
        self.queue              = Queue(queue_size)
        self.execute_tasks      = True
        self.stop_on_exception  = stop_on_exception

        # create a worker
        def _worker():
            while self.execute_tasks:

                # get a task
                task = None
                try:
                    task = self.queue.get(False, 1)
                except Empty:
                    continue

                # execute the task
                failed = True
                try:
                    task.run_task()
                    failed = False
                finally:
                    if failed and self.stop_on_exception:
                        print('Stopping due to exception')
                        self.execute_tasks = False
                    self.queue.task_done()

        # start threads
        for i in range(0, int(num_threads)):
            t = Thread(target=_worker)
            t.daemon = True
            t.start()


    def add_task(self, task, block=True, timeout=None):
        """
            Adds a task
        """
        if not self.execute_tasks:
            raise Exception('TaskRunner is not accepting tasks')
        self.queue.put(task, block, timeout)


    def wait_for_tasks(self):
        """
            Waits for tasks to complete
        """
        if not self.execute_tasks:
            raise Exception('TaskRunner is not accepting tasks')
        self.queue.join()

我所做的就是创建一个TaskRunner并向其添加任务（数千个），然后调用wait_for_tasks（）。 所以，显然在我做的重新架构中，我“修复”了我遇到的其他一些问题。 奇怪的是。

Answer 2

如果您仍在寻找多处理解决方案，首先可能需要查看如何使用工作池，然后您不必自己管理num_threads进程： http ： //docs.python.org/ 2 /库/ multiprocessing.html＃使用-A-池的工人

对于减速问题，您是否尝试将配置对象作为参数传递给_run函数？ 我不知道这是否会/如何在内部进行更改，但它猜测它可能会改变一些东西。

无法进行多处理以同时运行进程

问题描述

2 个解决方案

解决方案1
3 已采纳 2013-07-26 20:01:50

解决方案2
1 2013-07-29 20:40:36

无法进行多处理以同时运行进程

问题描述

2 个解决方案

解决方案1 3 已采纳 2013-07-26 20:01:50

解决方案2 1 2013-07-29 20:40:36

解决方案1
3 已采纳 2013-07-26 20:01:50

解决方案2
1 2013-07-29 20:40:36