简体   繁体   English

peewee和peewee-async:为什么异步更慢

[英]peewee and peewee-async: why is async slower

I am trying to wrap my head around Tornado and async connections to Postgresql. 我试图绕过Tornado和异步连接到Postgresql。 I found a library that can do this at http://peewee-async.readthedocs.io/en/latest/ . 我找到了一个可以在http://peewee-async.readthedocs.io/en/latest/上执行此操作的库。

I devised a little test to compare traditional Peewee and Peewee-async, but somehow async works slower. 我设计了一个小测试来比较传统的Peewee和Peewee-async,但不知何故异步工作得更慢。

This is my app: 这是我的应用:

import peewee
import tornado.web
import logging
import asyncio
import peewee_async
import tornado.gen
import tornado.httpclient
from tornado.platform.asyncio import AsyncIOMainLoop

AsyncIOMainLoop().install()
app = tornado.web.Application(debug=True)
app.listen(port=8888)

# ===========
# Defining Async model
async_db = peewee_async.PooledPostgresqlDatabase(
    'reminderbot',
    user='reminderbot',
    password='reminderbot',
    host='localhost'
)
app.objects = peewee_async.Manager(async_db)
class AsyncHuman(peewee.Model):
    first_name = peewee.CharField()
    messenger_id = peewee.CharField()
    class Meta:
        database = async_db
        db_table = 'chats_human'


# ==========
# Defining Sync model
sync_db = peewee.PostgresqlDatabase(
    'reminderbot',
    user='reminderbot',
    password='reminderbot',
    host='localhost'
)
class SyncHuman(peewee.Model):
    first_name = peewee.CharField()
    messenger_id = peewee.CharField()
    class Meta:
        database = sync_db
        db_table = 'chats_human'

# defining two handlers - async and sync
class AsyncHandler(tornado.web.RequestHandler):

    async def get(self):
        """
        An asynchronous way to create an object and return its ID
        """
        obj = await self.application.objects.create(
            AsyncHuman, messenger_id='12345')
        self.write(
            {'id': obj.id,
             'messenger_id': obj.messenger_id}
        )


class SyncHandler(tornado.web.RequestHandler):

    def get(self):
        """
        An traditional synchronous way
        """
        obj = SyncHuman.create(messenger_id='12345')
        self.write({
            'id': obj.id,
            'messenger_id': obj.messenger_id
        })


app.add_handlers('', [
    (r"/receive_async", AsyncHandler),
    (r"/receive_sync", SyncHandler),
])

# Run loop
loop = asyncio.get_event_loop()
try:
    loop.run_forever()
except KeyboardInterrupt:
    print(" server stopped")

and this is what I get from Apache Benchmark: 这是我从Apache Benchmark得到的:

ab -n 100 -c 100 http://127.0.0.1:8888/receive_async

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    4   1.5      5       7
Processing:   621 1049 256.6   1054    1486
Waiting:      621 1048 256.6   1053    1485
Total:        628 1053 255.3   1058    1492

Percentage of the requests served within a certain time (ms)
  50%   1058
  66%   1196
  75%   1274
  80%   1324
  90%   1409
  95%   1452
  98%   1485
  99%   1492
 100%   1492 (longest request)




ab -n 100 -c 100 http://127.0.0.1:8888/receive_sync
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    5   1.9      5       8
Processing:     8  476 277.7    479    1052
Waiting:        7  476 277.7    478    1052
Total:         15  481 276.2    483    1060

Percentage of the requests served within a certain time (ms)
  50%    483
  66%    629
  75%    714
  80%    759
  90%    853
  95%    899
  98%   1051
  99%   1060
 100%   1060 (longest request)

why is sync faster? 为什么同步更快? where is the bottleneck I'm missing? 我错过了哪个瓶颈?

For a long explanation: 长篇解释:

http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/ http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/

For a short explanation: synchronous Python code is simple and mostly implemented in the standard library's socket module, which is pure C. Async Python code is more complex than synchronous code. 简单解释一下:同步Python代码很简单,大部分是在标准库的socket模块中实现的,它是纯粹的C.异步Python代码比同步代码更复杂。 Each request requires several executions of the main event loop code, which is written in Python (in the asyncio case here) and therefore has a lot of overhead compared to C code. 每个请求都需要多次执行主事件循环代码,这些代码是用Python编写的(在这里的asyncio情况下),因此与C代码相比有很多开销。

Benchmarks like yours show async's overhead dramatically, because there's no network latency between your application and your database, and you're doing a large number of very small database operations. 像你这样的基准显着显示异步的开销,因为你的应用程序和数据库之间没有网络延迟,而且你正在进行大量非常小的数据库操作。 Since every other aspect of the benchmark is fast, these many executions of the event loop logic add a large proportion of the total runtime. 由于基准测试的每个其他方面都很快,因此事件循环逻辑的这些执行会增加总运行时的很大一部分。

Mike Bayer's argument, linked above, is that low-latency scenarios like this are typical for database applications, and therefore database operations shouldn't be run on the event loop. 上面链接的Mike Bayer的论点是,像这样的低延迟场景对于数据库应用程序来说是典型的,因此不应该在事件循环上运行数据库操作。

Async is best for high-latency scenarios, like websockets and web crawlers, where the application spends most of its time waiting for the peer, rather than spending most of its time executing Python. Async最适用于高延迟场景,例如websockets和web crawler,其中应用程序花费大部分时间等待对等,而不是花费大部分时间来执行Python。

In conclusion: if your application has a good reason to be async (it deals with slow peers), having an async database driver is a good idea for the sake of consistent code, but expect some overhead. 总而言之:如果你的应用程序有充分的理由保持异步(它处理慢速对等),拥有异步数据库驱动程序是一个好主意,为了一致的代码,但期望一些开销。

If you don't need async for another reason, don't do async database calls, because they're a bit slower. 如果由于其他原因不需要异步,请不要执行异步数据库调用,因为它们有点慢。

Database ORMs introduce many complexities for async architectures. 数据库ORM为异步体系结构引入了许多复杂性。 There are several places within an ORM where blocking may take place and can be overwhelming to alter to an async form. ORM中有几个位置可能会发生阻塞,并且可能会压倒性地改为异步形式。 The places where blocking takes place can also vary depending on the database. 阻塞发生的位置也可能因数据库而异。 My guess as to why your results are so slow is because there's a lot of unoptimized calls to and from the event loop (I could be severely wrong, I mostly use SQLAlchemy or raw SQL these days). 我猜你的结果如此之慢的原因是因为事件循环中存在大量未经优化的调用(我可能会出现严重错误,这些天我主要使用SQLAlchemy或原始SQL)。 In my experience, it's generally quicker to execute database code in a thread and yield the result when it's available. 根据我的经验,在一个线程中执行数据库代码通常会更快,并在结果可用时产生结果。 I can't really speak for PeeWee, but SQLAlchemy is well suited to run in multiple threads and there aren't too many down sides (but the ones that do exist are very VERY annoying). 我不能真正代表PeeWee,但是SQLAlchemy非常适合在多个线程中运行,并且没有太多的缺点(但确实存在的非常非常烦人)。

I'd recommend you try your experiment using ThreadPoolExecutor and the synchronous Peewee module and run database functions in a thread. 我建议你尝试使用ThreadPoolExecutor和同步Peewee模块进行实验,并在一个线程中运行数据库函数。 You will have to make changes to your main code, however it would be worth it if you ask me. 您将不得不对主代码进行更改,但如果您问我,那将是值得的。 For example, let's say you opt to use callback code, then your ORM queries might look like this: 例如,假设您选择使用回调代码,那么您的ORM查询可能如下所示:

from concurrent.futures import ThreadPoolExecutor

executor = ThreadPoolExecutor(max_workers=10)

def queryByName(name):
    query = executor.submit(db_model.findOne, name=name)
    query.add_done_callback(processResult)

def processResult(query):
    orm_obj = query.results()
    # do stuff with the results

You could use yeild from or await in coroutines, but it was a bit problematic for me. 您可以在协同程序中使用yeild fromawait ,但这对我来说有点问题。 Also, I'm not well versed in coroutines yet. 另外,我还不熟悉协同程序。 This snippet should work well with Tornado so long as devs are careful about deadlocks, db sessions, and transactions. 只要开发人员小心死锁,数据库会话和事务,这个代码段就可以与Tornado一起使用。 These factors can really slow down your application if something goes wrong in the thread. 如果线程出现问题,这些因素确实会减慢您的应用程序速度。

If you're feeling very adventurous, MagicStack (the company behind asyncio) has a project called asyncpg and its supposed to be very fast! 如果您感觉非常冒险,MagicStack(asyncio背后的公司)有一个名为asyncpg的项目,它应该非常快! I've been meaning to try, but haven't found the time :( 我一直想尝试,但没有找到时间:(

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM