简体   繁体   English

包装 python async 用于同步执行

[英]Wrapping python async for synchronous execution

I'm trying to load data from a local Postgres database as quickly as possible, and it appears that the most performant python package is [asyncpg][1].我正在尝试尽快从本地 Postgres 数据库加载数据,似乎性能最高的 python package 是 [asyncpg][1]。 My code is synchronous, and I repeatedly need to load chunks of data.我的代码是同步的,我反复需要加载数据块。 I'm not interested in having the async keyword propagate to every function I've written, so I'm trying to wrap the async code in a synchronous function.我对将async关键字传播到我编写的每个 function 不感兴趣,因此我试图将异步代码包装在同步 function 中。

The code below works, but is incredibly ugly:下面的代码有效,但非常难看:

def connect_to_postgres(user, password, database, host):
    async def wrapped():
        return await asyncpg.connect(user=keys['user'], password=keys['password'],
                                    database='markets', host='127.0.0.1')
    loop = asyncio.get_event_loop()    
    db_connection = loop.run_until_complete(wrapped())
    return db_connection
    
db_connection = connect_to_postgres(keys['user'], keys['password'],
                                    'db', '127.0.0.1')

def fetch_from_postgres(query, db_connection):
    async def wrapped():
        return await db_connection.fetch(query)
    loop = asyncio.get_event_loop()    
    values = loop.run_until_complete(wrapped())
    return values

fetch_from_postgres("SELECT * from db LIMIT 5", db_connection)

In Julia I would do something like在 Julia 我会做类似的事情

f() = @async 5
g() = fetch(f())
g()

But in Python it seems I have to do the rather clunky,但是在 Python 中,我似乎不得不做相当笨重的事情,

async def f():
  return 5
def g():
  loop = asyncio.get_event_loop()    
  return loop.run_until_complete(f())

Just wondering if there's a better way?只是想知道是否有更好的方法?

Edit: the latter python example can of course be written using编辑:后面的 python 示例当然可以使用

def fetch(x):
    loop = asyncio.get_event_loop()    
    return loop.run_until_complete(x)

Edit 2: I do care about performance, but wish to use a synchronous programing approach.编辑 2:我确实关心性能,但希望使用同步编程方法。 asyncpg is 3x faster than psycopg2 as its core implementation is in Cython rather than Python, this is explained in more detail at https://magic.io/blog/asyncpg-1m-rows-from-postgres-to-python/ . asyncpg 比 psycopg2 快 3 倍,因为它的核心实现是在 Cython 而不是 Python,这在https://magic.io/blog/asyncpg-1m-rows-from-postgres-to-python/中有更详细的解释。 Hence my desire to wrap this asynchronous code.因此,我希望包装这个异步代码。

Although, still need to create an async wrapped function unless I'm missing something.虽然,除非我遗漏了什么,否则仍然需要创建一个异步包装的 function。 [1]: https://github.com/MagicStack/asyncpg [1]: https://github.com/MagicStack/asyncpg

This is not difficult to do if you set up your program structure at the beginning.如果您在开始时设置程序结构,这并不难做到。 You create a second thread in which your async code will run, and start its event loop.您创建第二个线程,异步代码将在其中运行,并启动其事件循环。 When your main thread, which remains entirely synchronous, wants the result of async call (coroutine), you use the method asyncio.run_coroutine_threadsafe .当保持完全同步的主线程想要异步调用(协程)的结果时,您使用方法asyncio.run_coroutine_threadsafe That method returns a concurrent.futures.Future object.该方法返回 concurrent.futures.Future object。 You obtain the returned value by calling its method result(), which blocks until the result is available.您可以通过调用其方法 result() 来获取返回值,该方法会阻塞直到结果可用。

It's almost as if you called the async method like a subroutine.这几乎就像您像子例程一样调用异步方法。 There is minimal overhead because you created only one secondary thread.因为您只创建了一个辅助线程,所以开销最小。 Here is a simple example:这是一个简单的例子:

import asyncio
import threading
from datetime import datetime

async def demo(t):
    await asyncio.sleep(t)
    print(f"Demo function {t} {datetime.now()}")
    return t

def main():
    def thr(loop):
        asyncio.set_event_loop(loop)
        loop.run_forever()
        
    loop = asyncio.new_event_loop()
    t = threading.Thread(target=thr, args=(loop, ), daemon=True)
    t.start()

    print("Main", datetime.now())
    t1 = asyncio.run_coroutine_threadsafe(demo(1.0), loop).result()
    t2 = asyncio.run_coroutine_threadsafe(demo(2.0), loop).result()
    print(t1, t2)

if __name__ == "__main__":
    main()

# >>> Main 2021-12-06 19:14:14.135206
# >>> Demo function 1.0 2021-12-06 19:14:15.146803
# >>> Demo function 2.0 2021-12-06 19:14:17.155898
# >>> 1.0 2.0

Your main program experiences a 1-second delay on the first invocation of demo(), and a 2-second delay on the second invocation.您的主程序在第一次调用 demo() 时遇到 1 秒延迟,在第二次调用时遇到 2 秒延迟。 That's because your main thread does not have an event loop and therefore cannot execute the two delays in parallel.那是因为您的主线程没有事件循环,因此无法并行执行两个延迟。 But that's exactly what you implied that you wanted, when you said that you wanted a synchronous program that uses a third-party async package.但这正是你暗示你想要的,当你说你想要一个使用第三方异步 package 的同步程序时。

This is a similar answer but the question is slightly different:这是一个类似的答案,但问题略有不同:

How can I have a synchronous facade over asyncpg APIs with Python asyncio? 如何使用 Python asyncio 在 asyncpg API 上实现同步外观?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM