简体繁体 English

Python异步事务psycopg2

[英]Python async transactions psycopg2

原文 2016-08-03 11:56:49 0 1 python/ postgresql/ asynchronous/ psycopg2/ python-db-api

It is possible to do async i/o with psycopg2 (which can be read here ) however I'm not sure how to do async transactions. 可以使用psycopg2进行异步i / o（可在此处阅读）但是我不知道如何进行异步事务。 Consider this sequence of things: 考虑这一系列事物：

Green Thread 1 starts transaction T 绿色线程1启动事务T.
GT1 issues update GT1发布更新
GT2 issues one transactional update GT2发布一个事务更新
GT1 issues update GT1发布更新
GT1 commits transaction T GT1提交事务T.

I assume that GT1 updates conflict with GT2 updates. 我假设GT1更新与GT2更新冲突。

Now according to docs : 现在根据文档：

Cursors created from the same connection are not isolated, ie, any changes done to the database by a cursor are immediately visible by the other cursors. 从同一连接创建的游标不是孤立的，即游标对数据库所做的任何更改都会被其他游标立即看到。

so we can't implement the flow above on cursors. 所以我们无法在游标上实现上面的流程。 We could implement it on different connections but since we are doing async then spawning (potentially) thousands db connections might be bad (not to mention that Postgres can't handle so much out-of-the-box). 我们可以在不同的连接上实现它，但是因为我们正在进行异步，所以产生（可能）数千个db连接可能是坏的（更不用说Postgres无法处理这么多开箱即用的）。

The other option is to have a pool of connections and reuse them. 另一种选择是建立一个连接池并重用它们。 But then if we issue X parallel transactions all other green threads are blocked until some connection is available. 但是，如果我们发出X并行事务，则所有其他绿色线程都会被阻塞，直到某些连接可用。 Thus the actual amount of useful green threads is ~X (assuming the app is heavily db bound) which raises question: why would we use async to begin with? 因此，有用的绿色线程的实际数量是~X（假设app严重受db限制）这引发了一个问题：为什么我们要使用异步开始？

Now this question can actually be generalized to DB API 2.0. 现在这个问题实际上可以推广到DB API 2.0。 Maybe the real answer is that DB API 2.0 is not suited for async programming? 也许真正的答案是DB API 2.0不适合异步编程？ How would we do async io on Postgresql then? 那么我们如何在Postgresql上做异步io呢？ Maybe some other library? 也许其他一些图书馆？

Or maybe is that because the postgresql protocol is actually synchronous? 或者是因为postgresql协议实际上是同步的？ It would be perfect to be able to "write" to any transaction at any time (per connection). 能够在任何时间（每个连接）“写入”任何事务是完美的。 Postgresql would have to expose transaction's id for that. Postgresql必须为此公开事务的id。 Is it doable? 它可行吗？ Maybe two-phase commit is the answer? 也许两阶段提交就是答案？

Or am I missing something here? 或者我在这里遗漏了什么？

EDIT: This seems to be a general problem with SQL since BEGIN; COMMIT; 编辑：这似乎是自BEGIN; COMMIT;以来SQL的一般问题BEGIN; COMMIT; BEGIN; COMMIT; semantics just can't be used asynchronously efficiently. 语义不能有效地异步使用。

1 个解决方案

Actually you can use BEGIN; 实际上你可以使用BEGIN; and COMMIT; 和COMMIT; with async. 与异步。 What you need is a connection pool setup and make sure each green thread gets its own connection (Just like a real thread would in a multithreaded application). 你需要的是一个连接池设置，并确保每个绿色线程都有自己的连接（就像多线程应用程序中的真实线程一样）。

You cannot use psycopg2's builtin transaction handling. 你不能使用psycopg2的内置事务处理。