简体   繁体   English

多个线程同时使用同一个 JDBC 连接

[英]Concurrent use of same JDBC connection by multiple threads

I'm trying to better understand what will happen if multiple threads try to execute different sql queries, using the same JDBC connection, concurrently.我试图更好地了解如果多个线程尝试使用相同的 JDBC 连接同时执行不同的 sql 查询会发生什么。

  • Will the outcome be functionally correct?结果在功能上是否正确?

  • What are the performance implications?性能影响是什么?

  • Will thread A have to wait for thread B to be completely done with its query?线程A是否必须等待线程B完成其查询?

  • Or will thread A be able to send its query immediately after thread B has sent its query, after which the database will execute both queries in parallel?或者线程A能否在线程B发送查询后立即发送其查询,之后数据库将并行执行两个查询?


I see that the Apache DBCP uses synchronization protocols to ensure that connections obtained from the pool are removed from the pool, and made unavailable, until they are closed.我看到 Apache DBCP 使用同步协议来确保从池中获得的连接从池中删除,并使其不可用,直到它们关闭。 This seems more inconvenient than it needs to be.这似乎比它需要的更不方便。 I'm thinking of building my own "pool" simply by creating a static list of open connections, and distributing them in a round-robin manner.我正在考虑通过创建一个静态的开放连接列表并以循环方式分发它们来构建我自己的“池”。

I don't mind the occasional performance degradation, and the convenience of not having to close the connection after every use seems very appealing.我不介意偶尔的性能下降,每次使用后不必关闭连接的便利似乎非常吸引人。 Is there any downside to me doing this?我这样做有什么缺点吗?

I ran the following set of tests using a AWS RDS Postgres database, and Java 11:我使用 AWS RDS Postgres 数据库和 Java 11 运行了以下一组测试:

  1. Create a table with 11M rows, each row containing a single TEXT column, populated with a random 100-char string创建一个包含 1100 万行的表,每行包含一个 TEXT 列,填充随机的 100 个字符的字符串

  2. Pick a random 5 character string, and search for partial-matches of this string, in the above table随机选取一个5个字符的字符串,在上表中搜索该字符串的部分匹配项

  3. Time how long the above query takes to return results. Time 上述查询返回结果所需的时间。 In my case, it takes ~23 seconds.就我而言,大约需要 23 秒。 Because there are very few results returned, we can conclude that the majority of this 23 seconds is spent waiting for the DB to run the full-table-scan, and not in sending the request/response packets因为返回的结果很少,我们可以得出结论,这 23 秒的大部分时间都花在等待数据库运行全表扫描上,而不是发送请求/响应数据包

  4. Run multiple queries in parallel (with different keywords), using different connections.使用不同的连接并行运行多个查询(使用不同的关键字)。 In my case, I see that they all complete in ~23 seconds.就我而言,我看到它们都在大约 23 秒内完成。 Ie, the queries are being efficiently parallelized即,查询被有效地并行化

  5. Run multiple queries on parallel threads, using the same connection.使用相同的连接在并行线程上运行多个查询。 I now see that the first result comes back in ~23 seconds.我现在看到第一个结果在大约 23 秒后返回。 The second result comes back in ~46 seconds.第二个结果在大约 46 秒后返回。 The third in ~1 minute.约 1 分钟内的第三个。 etc etc. All the results are functionally correct, in that they match the specific keyword queried by that thread等等等等。所有结果在功能上都是正确的,因为它们匹配该线程查询的特定关键字

To add on to what Joni mentioned earlier, his conclusion matches the behavior I'm seeing on Postgres as well.补充一下 Joni 之前提到的内容,他的结论也与我在 Postgres 上看到的行为相符。 It appears that all "correctness" is preserved, but all parallelism benefits are lost, if multiple queries are sent on the same connection at the same time.如果同时在同一连接上发送多个查询,似乎保留了所有“正确性”,但所有并行性优势都将丢失。

Since the JDBC spec doesn't give guarantees of concurrent execution, this question can only be answered by testing the drivers you're interested in, or reading their source code.由于 JDBC 规范不保证并发执行,因此只能通过测试您感兴趣的驱动程序或阅读它们的源代码来回答这个问题。

In the case of MySQL Connector/J, all methods to execute statements lock the connection with a synchronized block.在 MySQL Connector/J 的情况下,所有execute语句的方法都使用synchronized块锁定连接。 That is, if one thread is running a query, other threads using the connection will be blocked until it finishes.也就是说,如果一个线程正在运行一个查询,使用该连接的其他线程将被阻塞,直到它完成。

Doing things the wrong way will have undefined results... if someone runs some tests, maybe they'll answer all your questions exactly, but then a new JVM comes out, or someone tries it on another jdbc driver or database version, or they hit a different set of race conditions, or tries another platform or JVM implementation, and another different undefined result happens.以错误的方式做事会产生不确定的结果......如果有人运行一些测试,也许他们会准确地回答你所有的问题,但随后出现了一个新的 JVM,或者有人在另一个 jdbc 驱动程序或数据库版本上尝试它,或者他们遇到一组不同的竞争条件,或者尝试另一个平台或 JVM 实现,并且会发生另一个不同的未定义结果。

If two threads modify the same state at the same time, anything could happen depending on the timing.如果两个线程同时修改相同的状态,则任何事情都可能发生,具体取决于时间。 Maybe the 2nd one overwrites the first's query, and then both run the same query.也许第二个会覆盖第一个查询,然后两者都运行相同的查询。 Maybe the library will detect your error and throw an exception.也许库会检测到您的错误并抛出异常。 I don't know and wouldn't bother testing... (or maybe someone already knows or it should be obvious what would happen) so this isn't "the answer", but just some advice.我不知道也不会打扰测试......(或者可能有人已经知道或者应该很明显会发生什么)所以这不是“答案”,而只是一些建议。 Just use a connection pool, or use a synchronized block to ensure problems don't happen.只需使用连接池,或使用同步块来确保不会发生问题。

We had to disable the statement cache on Websphere, because it was throwing ArrayOutOfBoundsException at PreparedStatement level.我们不得不禁用 Websphere 上的语句缓存,因为它在 PreparedStatement 级别抛出 ArrayOutOfBoundsException。 The issue was that some guy though it was smart to share a connection with multiple threads.问题是有些人虽然与多个线程共享连接很聪明。 He said it was to save connections, but there is no point multithreading queries because the db won't run them parallel.他说这是为了保存连接,但是多线程查询没有意义,因为数据库不会并行运行它们。

There was also an issue with a java runnables that were blocking each others because they used the same connection. java runnables 也存在一个问题,因为它们使用相同的连接而相互阻塞。

So that's just something to not do, there is nothing to gain.所以这只是一些不做的事情,没有任何好处。

There is an option in websphere to detect this multithreaded access. websphere 中有一个选项可以检测这种多线程访问。 I implemented my own since we use jetty in developpement.我实现了自己的,因为我们在开发中使用码头。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM