简体繁体 English

非阻塞与阻塞 Java 服务器与 JDBC 调用

[英]Non-blocking vs blocking Java server with JDBC calls

原文 2022-12-08 18:18:38 3 2 java/ asynchronous/ jdbc/ nonblocking

Our gRPC need to handle 1000 QPS and each request requires a list of sequential operations to happen, including one which is to read data from the DB using JDBC. Handling a single request takes at most 50ms.我们的 gRPC 需要处理 1000 QPS，每个请求都需要一系列顺序操作，包括使用 JDBC 从数据库读取数据。处理单个请求最多需要 50 毫秒。

Our application can be written in two ways:我们的应用程序可以用两种方式编写：

Option 1 - Classic one blocking thread per request: we can create a large thread pool (~200) and simply assign one thread per request and have that thread block while it waits for the DB.选项 1 - 经典的每个请求一个阻塞线程：我们可以创建一个大型线程池（~200）并简单地为每个请求分配一个线程并让该线程在等待 DB 时阻塞。
Option 2 - Having each request handled in a truly non-blocking fashion: .选项 2 - 以真正非阻塞的方式处理每个请求： . This would require us to use a non-blocking MySQL client which I don't know if it exist, but for now let's assume it exist.这将要求我们使用非阻塞 MySQL 客户端，我不知道它是否存在，但现在让我们假设它存在。

My understanding is that non-blocking approach has these pros and cons:我的理解是非阻塞方法具有以下优点和缺点：

Pros: Allows to reduce the number of threads required, and as a such reduce the memory footprint优点：允许减少所需的线程数，从而减少 memory 占用空间
Pros: Save some overhead on the OS since it doesn't need to give CPU time to the thread waiting for IO优点：节省操作系统的一些开销，因为它不需要为等待 IO 的线程提供 CPU 时间
Cons: For a large application (where each task is subscribing a callback to the previous task), it requires to split a single request to multiple threads creating a different kind of overhead.缺点：对于大型应用程序（其中每个任务都订阅对前一个任务的回调），它需要将单个请求拆分到多个线程，从而产生不同类型的开销。 And potentially if a same request gets executed on multiple physical core, it adds overhead as data might not be available in L1/L2 core cache.如果同一请求在多个物理内核上执行，则可能会增加开销，因为数据可能在 L1/L2 内核缓存中不可用。

Question 1: Even though non blocking application seems to be the new cool thing, my understanding is that for an application that aren't memory bounded and where creating more threads isn't a problem, it's not clear that writing a non-blocking application is actually more CPU efficient than writing blocking application.问题 1：尽管非阻塞应用程序似乎是很酷的新事物，但我的理解是，对于不受 memory 限制且创建更多线程不是问题的应用程序，编写非阻塞应用程序尚不清楚实际上比编写阻塞应用程序的 CPU 效率更高。 Is there any reason to believe otherwise?有任何理由不相信吗？

Question 2: My understanding is also that if we use JDBC, the connection is actually blocking and even if we make the rest of our application to be non-blocking, because of the JDBC client we lose all the benefit and in that case a Option 1 is most likely better?问题 2：我的理解也是，如果我们使用 JDBC，连接实际上是阻塞的，即使我们将应用程序的 rest 设为非阻塞，由于 JDBC 客户端，我们失去了所有好处，在这种情况下，一个选项1 最有可能更好？

2 个解决方案

For question 1, you are correct -- non-blocking is not inherently better (and with the arrival of Virtual Threads , it's about to become a lot worse in comparison to good old thread-per-request).对于问题 1，你是对的——非阻塞并不是天生就更好（随着虚拟线程的到来，与良好的旧线程请求相比，它将会变得更糟）。 At best, you could look at the tools you are working with and do some performance testing with a small scale example.充其量，您可以查看正在使用的工具，并使用小规模示例进行一些性能测试。 But frankly, that is down to the tool, not the strategy (at least, until Virtual Threads get here).但坦率地说，这取决于工具，而不是策略（至少，在虚拟线程出现之前）。

For question 2, I would strongly encourage you to choose the solution that works best with your tool/framework.对于问题 2，我强烈建议您选择最适合您的工具/框架的解决方案。 Staying within your ecosystem will allow you to make more flexible moves when the time comes to optimize.待在您的生态系统内将使您能够在需要优化时采取更灵活的行动。

But all things equal, I would strongly encourage you to stick with thread-per-request, since you are working with Java. Ignoring Virtual Threads, thread-per-request allows you to work with and manage simple, blocking, synchronous code.但在所有条件相同的情况下，我强烈建议您坚持使用 thread-per-request，因为您正在使用 Java。忽略虚拟线程，thread-per-request 允许您使用和管理简单的、阻塞的、同步的代码。 You don't have to deal with callbacks or tracing the logic through confusing and piecemeal logs.您不必处理回调或通过混乱和零碎的日志来跟踪逻辑。 Simply make a thread per request, let it block where it does, and then let your scheduler handle which thread should have the CPU core at any given time.简单地为每个请求创建一个线程，让它在它阻塞的地方阻塞，然后让你的调度程序在任何给定时间处理哪个线程应该拥有 CPU 核心。

Pros: Save some overhead on the OS since it doesn't need to give CPU time to the thread waiting for IO优点：节省操作系统的一些开销，因为它不需要为等待 IO 的线程提供 CPU 时间

It's not just the CPU time for waiting threads, but also the overhead of switching between threads competing for the CPU.不仅仅是等待线程的CPU时间，还有在线程之间切换竞争CPU的开销。 As you have more threads, more of them will be in a running state, and the CPU time must be spread between them.当你有更多的线程时，更多的线程将在 state 中运行，并且 CPU 时间必须在它们之间分配。 This requires a lot of memory management for switching.这就需要大量的memory管理进行切换。

Cons: For a large application (where each task is subscribing a callback to the previous task), it requires to split a single request to multiple threads creating a different kind of overhead.缺点：对于大型应用程序（其中每个任务都订阅对前一个任务的回调），它需要将单个请求拆分到多个线程，从而产生不同类型的开销。 And potentially if a same request gets executed on multiple physical core, it adds overhead as data might not be available in L1/L2 core cache.如果同一请求在多个物理内核上执行，则可能会增加开销，因为数据可能在 L1/L2 内核缓存中不可用。

This also happens with the “classic” approach since blocking calls will cause the CPU to switch to a different thread, and, as stated before, the CPU will even have to switch between runnable threads to share the CPU time as their number increases.这也发生在“经典”方法中，因为阻塞调用将导致 CPU 切换到不同的线程，并且如前所述，CPU 甚至必须在可运行线程之间切换以随着线程数量的增加共享 CPU 时间。

Question 1 : […] for an application that aren't memory bounded and where creating more threads isn't a problem问题 1 ：[…] 对于不受 memory 限制且创建更多线程不是问题的应用程序

In the current state of Java, creating more threads is always going to be a problem at some point.在Java的当前state中，创建更多线程总是会在某些时候成为问题。 With the thread-per-request model, it depends how many requests you have in parallel.使用 thread-per-request model，这取决于您有多少个并行请求。 1000, probably ok, 10000… maybe not. 1000，可能还行，10000……也许不行。

it's not clear that writing a non-blocking application is actually more CPU efficient than writing blocking application.目前尚不清楚编写非阻塞应用程序实际上是否比编写阻塞应用程序的 CPU 效率更高。 Is there any reason to believe otherwise?有任何理由不相信吗？

It is not just a question of efficiency, but also scalability.这不仅仅是效率问题，还有可扩展性问题。 For the performance itself, this would require proper load testing.对于性能本身，这需要进行适当的负载测试。 You may also want to check Is non-blocking I/O really faster than multi-threaded blocking I/O?您可能还想检查非阻塞 I/O 真的比多线程阻塞 I/O 快吗？ How? 如何？

Question 2 : My understanding is also that if we use JDBC, the connection is actually blocking and even if we make the rest of our application to be non-blocking, because of the JDBC client we lose all the benefit and in that case a Option 1 is most likely better?问题 2 ：我的理解也是，如果我们使用 JDBC，连接实际上是阻塞的，即使我们将应用程序的 rest 设为非阻塞，由于 JDBC 客户端，我们失去了所有好处，在这种情况下，一个选项1 最有可能更好？

JDBC is indeed a synchronous API. Oracle was working on ADBA as an asynchronous equivalent, but they discontinued it, considering that Project Loom will make it irrelevant. JDBC 确实是一个同步的 API。Oracle 作为异步等效项在 ADBA 上工作，但考虑到 Project Loom 将使它变得无关紧要，他们停止了它。 R2DBC provides an alternative which supports MySQL. Spring even supports reactive transactions . R2DBC提供了一个支持 MySQL 的替代方案。Spring 甚至支持反应式事务。