简体   繁体   English

python 3 asyncio和MotorClient:如何在多线程和多个事件循环中使用motor

[英]python 3 asyncio and MotorClient: how to use motor with multithreading and multiple event loops

I am back with a question about asyncio. 我回来了关于异步的问题。 I find it very useful (especially due to the GIL with threads) and I am trying to boost performances of some pieces of code. 我发现它非常有用(尤其是由于带有线程的GIL),并且我试图提高某些代码的性能。

My application is doing the following: 我的应用程序正在执行以下操作:

  • 1 Background daemon thread "A" receives events from connected clients and reacts by populating a SetQueue (that simply is an event queue that removes duplicate ids) and by doing some insertions in a DB. 1后台守护程序线程“ A”从连接的客户端接收事件,并通过填充SetQueue(简单地是一个删除重复ID的事件队列)并在数据库中进行一些插入来作出反应。 I get this daemon from another module (basically I control a callback from when an event is received). 我从另一个模块获取此守护程序(基本上,从接收事件时开始控制回调)。 In my sample code below I substituted this with a thread I generate and that very simply just populates the queue with 20 items and mimics DB inserts before exiting. 在下面的示例代码中,我将其替换为生成的线程,并且非常简单地将队列中填充20个项目,并在退出之前模拟数据库插入。
  • 1 Background daemon thread "B" is launched (loop_start) and he just loops over running until completion a coroutine that: 1启动后台守护程序线程“ B”(loop_start),他只是循环运行直到完成协程,该协程如下:

    • Fetches all the items in the queue (if not empty, otherwise it release the control for x seconds and then the coroutine is re-launched) 获取队列中的所有项目(如果不为空,则释放控件x秒钟,然后重新启动协程)
    • For each id in the queue it launches a chained coroutine that: 对于队列中的每个id,它将启动一个链式协程,该协程:

      • Creates and waits for a task that just fetches all relevant information for that id from the DB. 创建并等待一个任务,该任务仅从数据库中获取该ID的所有相关信息。 I am using MotorClient that supports asyncio to do await in the task itself. 我正在使用支持asyncio的MotorClient在任务本身中进行等待。

      • Uses an Pool of Processes executor to launch a process per id that uses the DB data to do some CPU intensive processing. 使用“进程池”执行程序按ID启动一个进程,该进程使用DB数据执行一些CPU密集型处理。

  • The main thread just initializes the db_client and takes loop_start and stop commands. 主线程只是初始化db_client并接受loop_start和stop命令。

That is basically it. 基本上就是这样。

Now I am trying to boost performance as much as possible. 现在,我试图尽可能提高性能。

My current issue is in using motor.motor_asyncio.AsyncioMotorClient() in this way: 我当前的问题是以这种方式使用motor.motor_asyncio.AsyncioMotorClient()

  1. It gets initialized in the main thread and there I want to create indexes 它在主线程中初始化,我要在其中创建索引
  2. Thread "A" needs to perform DB insertions 线程“ A”需要执行数据库插入
  3. Thread "B" needs to perform DB finds/reads 线程“ B”需要执行数据库查找/读取

How can I do this? 我怎样才能做到这一点? Motor states that it is meant for a single thread application where you use obviously a single event loop. Motor指出,这是针对单线程应用程序的,您显然在其中使用了单个事件循环。 Here I found myself forced to have two events loops, one in thread "A" and one in thread "B". 在这里,我发现自己被迫拥有两个事件循环,一个在线程“ A”中,一个在线程“ B”中。 This is not optimal, but I didn't manage to use a single event loop with call_soon_threadsafe while keeping the same behavior...and I think performance wise I am still gaining much with two events loop that release control over the gil bound cpu core. 这不是最佳选择,但是在保持相同行为的同时,我没有设法在call_soon_threadsafe中使用单个事件循环...而且我认为在性能方面,我仍然可以通过释放两个事件循环来获得对gil绑定cpu核心的控制权。

Should I use three different AsyncioMotorClient instances (one per thread) and use them as stated above? 我应该使用三个不同的AsyncioMotorClient实例(每个线程一个)并如上所述使用它们吗? I failed with different errors while trying. 尝试时,我因其他错误而失败。

Here is my sample code that doesn't include just the the MotorClient initialization in Asynchro's __init__ 这是我的示例代码,其中不仅仅包含Asynchro的__init__的MotorClient初始化。

import threading
import asyncio
import concurrent.futures
import functools
import os
import time
import logging
from random import randint
from queue import Queue





# create logger
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# create file handler which logs even debug messages
fh = logging.FileHandler('{}.log'.format(__name__))
fh.setLevel(logging.DEBUG)
# create console handler with a higher log level
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
# create formatter and add it to the handlers
formatter = logging.Formatter('%(asctime)s - %(name)s - %(processName)s - %(threadName)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
ch.setFormatter(formatter)
# add the handlers to the logger
logger.addHandler(fh)
logger.addHandler(ch)


class SetQueue(Queue):
    """Queue that avoids duplicate entries while keeping an order."""
    def _init(self, maxsize):
        self.maxsize = maxsize
        self.queue = set()

    def _put(self, item):
        if type(item) is not int:
            raise TypeError
        self.queue.add(item)

    def _get(self):
        # Get always all items in a thread-safe manner
        ret = self.queue.copy()
        self.queue.clear()
        return ret


class Asynchro:
    def __init__(self, event_queue):
        self.__daemon = None
        self.__daemon_terminate = False
        self.__queue = event_queue

    def fake_populate(self, size):
        t = threading.Thread(target=self.worker, args=(size,))
        t.daemon = True
        t.start()

    def worker(self, size):
        run = True
        populate_event_loop = asyncio.new_event_loop()
        asyncio.set_event_loop(populate_event_loop)
        cors = [self.worker_cor(i, populate_event_loop) for i in range(size)]
        done, pending = populate_event_loop.run_until_complete(asyncio.wait(cors))
        logger.debug('Finished to populate event queue with result done={}, pending={}.'.format(done, pending))
        while run:
            # Keep it alive to simulate something still alive (minor traffic)
            time.sleep(5)
            rand = randint(100, 200)
            populate_event_loop.run_until_complete(self.worker_cor(rand, populate_event_loop))
            if self.__daemon_terminate:
                logger.debug('Closed the populate_event_loop.')
                populate_event_loop.close()
                run = False

    async def worker_cor(self, i, loop):
        time.sleep(0.5)
        self.__queue.put(i)
        logger.debug('Wrote {} in the event queue that has now size {}.'.format(i, self.__queue.qsize()))
        # Launch fake DB Insertions
        #db_task = loop.create_task(self.fake_db_insert(i))
        db_data = await self.fake_db_insert(i)
        logger.info('Finished to populate with id {}'.format(i))
        return db_data

    @staticmethod
    async def fake_db_insert(item):
        # Fake some DB insert
        logger.debug('Starting fake db insertion with id {}'.format(item))
        st = randint(1, 101) / 100
        await asyncio.sleep(st)
        logger.debug('Finished db insertion with id {}, sleep {}'.format(item, st))
        return item

    def loop_start(self):
        logger.info('Starting the loop.')
        if self.__daemon is not None:
            raise Exception
        self.__daemon_terminate = False
        self.__daemon = threading.Thread(target=self.__daemon_main)
        self.__daemon.daemon = True
        self.__daemon.start()

    def loop_stop(self):
        logger.info('Stopping the loop.')
        if self.__daemon is None:
            raise Exception
        self.__daemon_terminate = True
        if threading.current_thread() != self.__daemon:
            self.__daemon.join()
            self.__daemon = None
            logger.debug('Stopped the loop and closed the event_loop.')

    def __daemon_main(self):
        logger.info('Background daemon started (inside __daemon_main).')
        event_loop = asyncio.new_event_loop()
        asyncio.set_event_loop(event_loop)
        run, rc = True, 0
        while run:
            logger.info('Inside \"while run\".')
            event_loop.run_until_complete(self.__cor_main())
            if self.__daemon_terminate:
                event_loop.close()
                run = False
                rc = 1
        return rc

    async def __cor_main(self):
        # If nothing in the queue release control for a bit
        if self.__queue.qsize() == 0:
            logger.info('Event queue is empty, going to sleep (inside __cor_main).')
            await asyncio.sleep(10)
            return
        # Extract all items from event queue
        items = self.__queue.get()
        # Run asynchronously DB extraction and processing on the ids (using pool of processes)
        with concurrent.futures.ProcessPoolExecutor(max_workers=8) as executor:
            cors = [self.__cor_process(item, executor) for item in items]
            logger.debug('Launching {} coroutines to elaborate queue items (inside __cor_main).'.format(len(items)))
            done, pending = await asyncio.wait(cors)
            logger.debug('Finished to execute __cor_main with result {}, pending {}'
                         .format([t.result() for t in done], pending))

    async def __cor_process(self, item, executor):
        # Extract corresponding DB data
        event_loop = asyncio.get_event_loop()
        db_task = event_loop.create_task(self.fake_db_access(item))
        db_data = await db_task
        # Heavy processing of data done in different processes
        logger.debug('Launching processes to elaborate db_data.')
        res = await event_loop.run_in_executor(executor, functools.partial(self.fake_processing, db_data, None))
        return res

    @staticmethod
    async def fake_db_access(item):
        # Fake some db access
        logger.debug('Starting fake db access with id {}'.format(item))
        st = randint(1, 301) / 100
        await asyncio.sleep(st)
        logger.debug('Finished db access with id {}, sleep {}'.format(item, st))
        return item

    @staticmethod
    def fake_processing(db_data, _):
        # fake some CPU processing
        logger.debug('Starting fake processing with data {}'.format(db_data))
        st = randint(1, 101) / 10
        time.sleep(st)
        logger.debug('Finished fake processing with data {}, sleep {}, process id {}'.format(db_data, st, os.getpid()))
        return db_data


def main():
    # Event queue
    queue = SetQueue()
    return Asynchro(event_queue=queue)


if __name__ == '__main__':
    a = main()
    a.fake_populate(20)
    time.sleep(5)
    a.loop_start()
    time.sleep(20)
    a.loop_stop()

What's the reason for running multiple event loops? 运行多个事件循环的原因是什么?

I suggest just using the single loop in main thread, it's a native mode for asyncio. 我建议只在主线程中使用单循环,这是异步的本机模式。

asyncio might run loop in non-main thread in very rare scenarios but it doesn't look like your case. 在非常罕见的情况下,asyncio 可能会在非主线程中运行循环,但看起来并不像您的情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM