简体   繁体   English

ThreadPoolExecutor#execute。 如何重用正在运行的线程?

[英]ThreadPoolExecutor#execute. How to reuse running threads?

I used to use ThreadPoolExecutors for years and one of the main reasons - it is designed to 'faster' process many requests because of parallelism and 'ready-to-go' threads (there are other though). 我曾经使用ThreadPoolExecutors多年,这是主要原因之一-由于并行性和“准备就绪”线程(尽管还有其他原因),它旨在“更快”处理许多请求。

Now I'm stuck on minding inner design well known before. 现在,我一直在关注以前众所周知的内部设计。
Here is snippet from java 8 ThreadPoolExecutor: 这是Java 8 ThreadPoolExecutor的片段:

public void execute(Runnable command) {
    ...
    /*
     * Proceed in 3 steps:
     *
     * 1. If fewer than corePoolSize threads are running, try to
     * start a new thread with the given command as its first
     * task.  The call to addWorker atomically checks runState and
     * workerCount, and so prevents false alarms that would add
     * threads when it shouldn't, by returning false.
     */
    ...
    int c = ctl.get();
    if (workerCountOf(c) < corePoolSize) {
        if (addWorker(command, true))
            return;
        c = ctl.get();
    }
...

I'm interested in this very first step as in most cases you do not want thread poll executor to store 'unprocessed requests' in the internal queue, it is better to leave them in external input Kafka topic / JMS queue etc. So I'm usually designing my performance / parallelism oriented executor to have zero internal capacity and 'caller runs rejection policy'. 我对这第一步很感兴趣,因为在大多数情况下,您不希望线程轮询执行程序将“未处理的请求”存储在内部队列中,因此最好将它们保留在外部输入Kafka主题/ JMS队列中。我通常将我的基于性能/并行性的执行器设计为内部容量为零,并且“调用者运行拒绝策略”。 You chose some sane big amount of parallel threads and core pool timeout not scare others and show how big the value is ;). 您选择了大量合理的并行线程和核心池超时,而不是吓scar其他线程,并显示出该值有多大;)。 I don't use internal queue and I want tasks to start to be processed the earlier the better, thus it has become 'fixed thread pool executor'. 我不使用内部队列,我希望任务越早开始处理越好,因此它已成为“固定线程池执行程序”。 Thus in most cases I'm under this 'first step' of the method logic. 因此,在大多数情况下,我处于方法逻辑的“第一步”之下。

Here is the question: is this really the case that it will not 'reuse' existing threads but will create new one each time it is 'under core size' (most cases)? 这里的问题是:是否真的不会“重用”现有线程,而是在每次“小于核心大小”时创建新线程(大多数情况下)? Would it be not better to 'add new core thread only if all others are busy' and not 'when we have a chance to suck for a while on another thread creation'? “仅在所有其他线程都忙时才添加新的核心线程”,而不是“当我们有机会在另一线程创建时吸一会儿”,不是更好吗? Am I missing anything? 我有什么想念的吗?

The doc describes the relationship between the corePoolSize, maxPoolSize, and the task queue, and what happens when a task is submitted. 文档描述了corePoolSize,maxPoolSize和任务队列之间的关系,以及提交任务时会发生的情况。

...but will create new one [thread] each time it is 'under core size...' ...但是每次“低于核心大小...”都会创建一个新的[thread]

Yes. 是。 From the doc: 从文档中:

When a new task is submitted in method execute(Runnable), and fewer than corePoolSize threads are running, a new thread is created to handle the request, even if other worker threads are idle. 当在方法execute(Runnable)中提交新任务且运行的线程数少于corePoolSize线程时,即使其他工作线程处于空闲状态,也会创建一个新线程来处理请求。

Would it be not better to add new core thread only if all others are busy... 仅在所有其他线程都忙时添加新的核心线程会更好吗?

Since you don't want to use the internal queue this seems reasonable. 由于您不想使用内部队列,因此这似乎是合理的。 So set the corePoolSize and maxPoolSize to be the same. 因此,将corePoolSize和maxPoolSize设置为相同。 Once the ramp up of creating the threads is complete there won't be any more creation. 一旦完成创建线程的扩展,就不会再进行任何创建了。

However, using CallerRunsPolicy would seem to hurt performance if the external queue grows faster than can be processed. 但是,如果外部队列增长得​​快于处理速度,则使用CallerRunsPolicy似乎会损害性能。

Here is the question: is this really the case that it will not 'reuse' existing threads but will create new one each time it is 'under core size' (most cases)? 这里的问题是:是否真的不会“重用”现有线程,而是在每次“小于核心大小”时创建新线程(大多数情况下)?

Yes that is how the code is documented and written. 是的,这就是代码的记录和编写方式。

Am I missing anything? 我有什么想念的吗?

Yes, I think you are missing the whole point of "core" threads. 是的,我认为您缺少“核心”线程的全部要点。 Core threads are defined in the Executors docs are: Executors文档中定义的核心线程是:

... threads to keep in the pool, even if they are idle.

That's the definition. 那就是定义。 Thread startup is a non trivial process and so if you have 10 core threads in a pool, the first 10 requests to the pool each start a thread until all of the core threads are live. 线程启动是一个不平凡的过程,因此,如果池中有10个核心线程,则对池的前10个请求每个都会启动一个线程,直到所有核心线程都处于活动状态。 This spreads the startup load across the first X requests. 这将启动负载分散在前X个请求中。 This is not about getting the tasks done, this is about initializing the TPE and spreading the thread creation load out. 这不是要完成任务,而是要初始化TPE并分散线程创建负载。 You could call prestartAllCoreThreads() if you don't want this behavior. 如果您不希望这种行为,可以调用prestartAllCoreThreads()

The whole purpose of the core threads is to have threads already started and running available to work on tasks immediately. 核心线程的全部目的是使线程已经启动并可以运行以立即处理任务。 If we had to start a thread each time we needed one, there would be unnecessary resource allocation time and thread start/stop overhead taking compute and OS resources. 如果每次需要一个线程时都必须启动一个线程,则将有不必要的资源分配时间,并且线程的启动/停止开销会占用计算和OS资源。 If you don't want the core threads then you can let them timeout and pay for the startup time. 如果您不想要核心线程,则可以让它们超时并为启动时间付费。

I used to use ThreadPoolExecutors for years and one of the main reasons - it is designed to 'faster' process many requests because of parallelism and 'ready-to-go' threads (there are other though). 我曾经使用ThreadPoolExecutors多年,这是主要原因之一-由于并行性和“准备就绪”线程(尽管还有其他原因),它旨在“更快”处理许多请求。

TPE is not necessarily "faster". TPE不一定“更快”。 We use it because to manually manage and communicate with a number of threads is hard and easy to get wrong. 我们之所以使用它,是因为手动管理多个线程并与之通信很容易并且容易出错。 That's why the TPE code is so powerful. 这就是TPE代码如此强大的原因。 It is the OS threads that give us parallelism. 正是OS线程赋予了我们并行性。

I don't use internal queue and I want tasks to start to be processed the earlier the better, 我不使用内部队列,我希望任务越早开始处理越好,

The entire point of a threaded program is the maximize throughput. 线程程序的重点是最大吞吐量。 If you run 100 threads on a 4 core system and the tasks are CPU intensive, you are going to pay for the increased context switching and the overall time to process a large number of requests is going to decrease. 如果您在4核心系统上运行100个线程,并且任务占用大量CPU,则您将为增加的上下文切换付出代价,并且处理大量请求的总时间将减少。 Your application is also most likely competing for resources on the server with other programs and you don't want to cause it to slow to a crawl if 100s of jobs try to run in a thread pool at the same time. 您的应用程序也很有可能与其他程序争夺服务器上的资源,并且如果在同一时间尝试在线程池中运行100个作业,您也不想使其慢下来。

The whole point of limiting your core threads (ie not making them a "sane big amount") is that there is an optimal number of concurrent threads that will maximize the overall throughput of your application. 限制核心线程的全部要点(即不要使它们成为“大量”)是因为存在最佳数量的并发线程,这些线程可以最大程度地提高应用程序的整体吞吐量。 It can be really hard to find the optimal core thread size but experimentation, if possible, would help. 很难找到最佳的核心线程大小,但如果可能的话,进行实验会有所帮助。

It depends highly on the degree of CPU versus IO in a task. 它在很大程度上取决于任务中CPU与IO的程度。 If the tasks are making remote RPC calls to a slow service then it might make sense to have a large number of core threads in your pool. 如果任务正在对速度较慢的服务进行远程RPC调用,则在池中拥有大量核心线程可能是有意义的。 If they are predominantly CPU tasks, however, you are going to want to be closer to the number of CPU/cores and then queue the rest of the tasks. 但是,如果它们主要是CPU任务,那么您将希望更接近CPU /内核数,然后将其余任务排队。 Again it is all about overall throughput. 同样,这全部与总体吞吐量有关。

To reuse threads one need somehow to transfer task to existing thread. 要重用线程,需要以某种方式将任务转移到现有线程。
This pushed me towards synchronous queue and zero core pool size. 这将我推向了同步队列和零核心池大小。

return new ThreadPoolExecutor(0, maxThreadsCount,
        10L, SECONDS,
        new SynchronousQueue<Runnable>(),
        new BasicThreadFactory.Builder().namingPattern("processor-%d").build());

I have really reduced amounts of 'peaks' of 500 - 1500 (ms) on my 'main flow'. 我的“主流”上的“峰值”实际上减少了500-1500(ms)。
But this will work only for zero-sized queue. 但这仅适用于零大小的队列。 For non-zero-sized queue question is still open. 对于非零大小的队列,问题仍然存在。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM