简体繁体 English

闲置的线程=坏？

[英]Threads sitting idle = bad?

原文 2014-03-04 21:04:54 3 3 java/ multithreading/ asynchronous/ nio/ c10k

I want to support around 10,000 simultaneous HTTP clients on a small cluster of machines (as small as possible). 我想在一小组机器上支持大约10,000个并发HTTP客户端（尽可能小）。 I'd like to keep a connection to each client alive while the user is using the application, to allow the server to push updates. 我想在用户使用应用程序时保持与每个客户端的连接，以允许服务器推送更新。

I believe that async IO is often recommended for these kinds of long-lived connections, to avoid having lots of threads sitting idle. 我相信通常建议将async IO用于这些长期连接，以避免大量线程处于空闲状态。 But what are the issues in having threads sitting idle? 但线程闲置有什么问题？ I find the threaded model mentally easier to work with, but I don't want to do something that is going to cause me major headaches. 我发现线程模型在精神上更容易使用，但我不想做一些会让我头疼的事情。 I guess I'll have to experiment, but I wondered if anyone knows of any previous experiments along these lines? 我想我将不得不进行实验，但我想知道是否有人知道以前的这些实验中的任何实验？

3 个解决方案

Asynchronous I/O basically means that your application does most of the thread scheduling. 异步I / O基本上意味着您的应用程序执行大部分线程调度。 Instead of letting the OS randomly suspend your thread and schedule another one, you have only as many threads as there are CPU cores and yield to other tasks at the most appropriate points—when the thread reaches an I/O operation, which will take some time. 而不是让操作系统随机挂起你的线程并安排另一个线程，你只有与CPU内核一样多的线程，并在最合适的点上产生其他任务 - 当线程到达I / O操作时，这需要一些时间。

The above seems like a clear win from the performance standpoint, but the asynchronous programming model is much more complex in several regards: 从性能的角度来看，上述似乎是一个明显的胜利，但异步编程模型在以下几个方面要复杂得多：

it can't be expressed as a single function call so the workflow is not obvious, especially when transfer of control flow due to exceptions is considered; 它不能表示为单个函数调用，因此工作流程不明显，特别是考虑到由于异常而导致的控制流转移时;
without specifically targetted support from the programming language the idioms are very messy: spaghetti code and/or extremely weak signal-to-noise ratio are the norm; 如果没有特别针对编程语言的支持，成语非常混乱：意大利面条代码和/或极弱的信噪比是常态;
mostly due to 1. above debugging is much more difficult as the stack trace does not represent the progress within a unit of work as a whole; 主要是由于1.以上调试要困难得多，因为堆栈跟踪并不代表整个工作单元的进度;
execution jumps from thread to thread within a pool (or even several pools, where each layer of abstraction has its own) so profiling and monitoring with the usual tools is rendered virtually useless. 执行从池中的线程跳转到线程（甚至是几个池，每个抽象层都有自己的），因此使用常用工具进行分析和监视实际上是无用的。

On the other hand, many favorable improvements and optimizations have happened to the modern OS's which mostly eliminate the performance downsides of synchronous I/O programming: 另一方面，现代操作系统发生了许多有利的改进和优化，这大大消除了同步I / O编程的性能缺点：

the address space is huge, so space reserved for stacks isn't a problem; 地址空间很大，所以为堆栈预留的空间不是问题;
the actual physical RAM load of call stacks is not very large as only the part of the stack actually used by a thread is committed to RAM, and a call stack doesn't normally exceed 64K; 调用堆栈的实际物理RAM负载不是很大，因为只有线程实际使用的堆栈部分被提交到RAM，而调用堆栈通常不超过64K;
context switching, which used to be prohibitively expensive for larger thread counts, has been improved to the point where its overhead is negligible for all practical purposes. 对于较大的线程数而言过去非常昂贵的上下文切换已被改进到其开销对于所有实际目的而言可忽略不计的程度。

A classical paper going through much of the above and some other points is a good complement to what I am saying here: 经历上述大部分内容和其他一些观点的经典论文是我在这里所说的一个很好的补充：

https://www.usenix.org/legacy/events/hotos03/tech/full_papers/vonbehren/vonbehren_html/index.html https://www.usenix.org/legacy/events/hotos03/tech/full_papers/vonbehren/vonbehren_html/index.html

There are already some good pointers in the comments of your question. 您的问题的评论中已经有一些很好的指示。

The reason for not using 10K threads is this costs memory resources, and memory costs energy. 不使用10K线程的原因是这会花费内存资源，而内存会耗费能量。 The programming model is no argument, because the thread sitting on the client connection, mustn't be the same that wants to post the event. 编程模型没有参数，因为坐在客户端连接上的线程不能与想要发布事件的线程相同。

Please take a look at the websockets standard and the asynchronous request processing model in the Servlet 3.0 standard. 请查看Servlet 3.0标准中的websockets标准和异步请求处理模型。 All recent java web application servers implement it now (eg Glassfish and Tomcat) and it is the solution for your problem. 所有最近的Java Web应用程序服务器现在都实现它（例如Glassfish和Tomcat），它是您的问题的解决方案。

The question itself cannot be answered since the OS, JVM and application server you use is missing. 由于您使用的操作系统，JVM和应用程序服务器缺失，因此无法回答问题本身。 However, you can test it quite fast by yourself, by just creating a servlet or JSP with Thread.sleep(9999999) and doing siege -c 10000 ... on it. 但是，您可以通过使用Thread.sleep(9999999)创建一个servlet或JSP并在其上执行siege -c 10000 ...来快速测试它。

10,000 simultaneous HTTP clients...what are the issues in having threads sitting idle? 10,000个同时发生的HTTP客户端......让线程空闲的问题是什么？

It seems that cost of an idle thread is only memory allocated for kernel structure (a few kb) and thread's stack (512kb-a number of mb). 似乎空闲线程的成本只是为内核结构（几kb）和线程堆栈（512kb-一个mb的数量）分配的内存。 But... 但...

Obviously, you are going to wake up each of your n-hundred threads from time to time, right? 显然，你会不时地唤醒你的每一百个线程，对吧？ And that is a moment when you pay the cost of the context switching, which may be not so small (time to call the system scheduler, more cache misses, etc). 这是您支付上下文切换成本的时刻，这可能不是那么小（调用系统调度程序的时间，更多的缓存未命中等）。 See, for instance: http://www.cs.rochester.edu/u/cli/research/switch.pdf 例如，见： http ： //www.cs.rochester.edu/u/cli/research/switch.pdf

And you will have to pin your threads very carefully to don't affect the system ones. 而且你必须非常小心地固定你的线程，以免影响系统线程。 As result, thread-per-connection (on blocking IO) architecture can increase the latency of the system comparing to async IO. 因此，与异步IO相比，每次连接线程（在阻塞IO上）架构可能会增加系统的延迟。 But it still can work for your case if almost all threads are parked most of the time. 但是，如果几乎所有线程都停在大部分时间，它仍然适用于您的情况。

And the final word. 最后一句话。 We don't know how many time your threads are going to be blocked on read() and how many work they need to do to process the received data. 我们不知道你的线程将在read（）上被阻塞多少次，以及他们需要做多少工作来处理接收到的数据。 What hardware, OS and network interfaces are going to be used... So, test a prototype of your system. 将使用哪些硬件，操作系统和网络接口......因此，测试系统的原型。