简体   繁体   English

使用Spring Boot在Java中发送异步HTTP请求

[英]Sending Async HTTP requests in Java using Spring Boot

I am working on an application which needs to test 1000's of proxy servers continuously. 我正在开发一个需要连续测试1000个代理服务器的应用程序。 The application is based around Spring Boot. 该应用程序基于Spring Boot。

The current approach I am using is @Async decorated method which takes a proxy server and returns the result. 我使用的当前方法是@Async装饰方法,它接受代理服务器并返回结果。

I am often getting OutOfMemory error and the processing is very slow. 我经常出现OutOfMemory错误,处理速度慢。 I assume that is because each async method is executed in a separate thread which blocks on I/O? 我假设这是因为每个异步方法都在一个阻塞I / O的单独线程中执行?

Everywhere I read about async in Java, people mix parallel execution in threads with non-blocking IO. 我读到Java中的异步,人们将线程中的并行执行与非阻塞IO混合在一起。 In the Python world, there is the async library which executes I/O requests in a single thread. 在Python世界中,有一个异步库,它在一个线程中执行I / O请求。 While a method is waiting for a response from server, it starts executing other method. 当一个方法正在等待来自服务器的响应时,它开始执行其他方法。

I think in my case, I need something like this because Spring's @Async is not suitable for me. 我认为在我的情况下,我需要这样的东西,因为Spring的@Async不适合我。 Can someone please help remove my confusion and suggest me how should I go about this challenge? 有人可以请帮助消除我的困惑,并建议我应该如何应对这一挑战?

I want to check 100's of proxies simultaneously without putting excessive load. 我想同时检查100个代理,而不会产生过多的负载。 I have read about Apache Async HTTP Client but I don't know if it is suitable? 我已经阅读过关于Apache Async HTTP Client但我不知道它是否合适?

This is the thread pool configuration I am using: 这是我使用的线程池配置:

    public Executor proxyTaskExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(Runtime.getRuntime().availableProcessors() * 2 - 1);
        executor.setMaxPoolSize(100);
        executor.setDaemon(true);
        return executor;
    }

I am often getting OutOfMemory error and the processing is very slow. 我经常出现OutOfMemory错误,处理速度很慢。 I assume that is because each async method is executed in a separate thread which blocks on I/O? 我假设这是因为每个异步方法都在一个阻塞I / O的单独线程中执行?

For the OOME, I explain it in the second point. 对于OOME,我在第二点解释它。
About the slowness, it is indeed related to I/O performed in the request/response processings. 关于缓慢,它确实与请求/响应处理中执行的I / O有关。
The problem comes from the number of thread running effectively in parallel. 问题来自并行有效运行的线程数。
With your actual configuration, the number of pool max is never reached (I explain why below). 根据您的实际配置,永远不会达到最大池数(我在下面解释原因)。 Supposing that corePoolSize==10 in your case. 假设你的案例中corePoolSize==10 It means that 10 threads run in parallel. 这意味着10个线程并行运行。 Suppose each thread runs about 3 seconds to test the site. 假设每个线程运行大约3秒来测试站点。
It means that you test a site in about 0.3 second. 这意味着您在大约0.3秒内测试一个站点。 To test 1000 sites it makes 300 seconds. 要测试1000个站点,它需要300秒。
It is slow enough and an important part of the time is waiting time : I/O to send/receive request/response from the site currently tested. 它足够慢并且时间的一个重要部分是等待时间:I / O从当前测试的站点发送/接收请求/响应。
To increase the overall speed, you should probably run in parallel initially much more threads than your core capacity. 为了提高整体速度,您最初应该并行运行比核心容量更多的线程。 In this way, I/O waiting time will be less a problem since the scheduling between the threads will be frequent and so you would have some I/O processings without value for the threads while these are paused. 通过这种方式,I / O等待时间将不再是一个问题,因为线程之间的调度将是频繁的,因此当线程暂停时,您将有一些没有线程值的I / O处理。


It should handle the OOME issue and probably improve strongly the execution time, but well no guarantee that you get a very short time. 它应该处理OOME问题并且可能会大大提高执行时间,但不能保证你会花很短的时间。
To achieve it you should probably work the multi-threading logic more finely and rely on API/libraries with non blocking IO. 为了实现它,您应该更精细地使用多线程逻辑,并依赖具有非阻塞IO的API /库。

Some information of the official documentation that should be helpful. 官方文档的一些信息应该有所帮助。
This part explains the overall logical when a task is submitted (emphasis is mine): 这部分解释了提交任务时的整体逻辑(重点是我的):

The configuration of the thread pool should also be considered in light of the executor's queue capacity. 还应根据执行程序的队列容量来考虑线程池的配置。 For the full description of the relationship between pool size and queue capacity, see the documentation for ThreadPoolExecutor. 有关池大小和队列容量之间关系的完整描述,请参阅ThreadPoolExecutor的文档。 The main idea is that, when a task is submitted, the executor first tries to use a free thread if the number of active threads is currently less than the core size. 主要思想是,当提交任务时,如果活动线程的数量当前小于核心大小,则执行程序首先尝试使用空闲线程。 If the core size has been reached, the task is added to the queue, as long as its capacity has not yet been reached. 如果已达到核心大小,则只要尚未达到其容量,任务就会添加到队列中。 Only then, if the queue's capacity has been reached, does the executor create a new thread beyond the core size. 只有这样,如果已达到队列的容量,执行程序是否会创建超出核心大小的新线程。 If the max size has also been reached, then the executor rejects the task. 如果还达到了最大大小,则执行程序拒绝该任务。

And this explains the consequences on the queue size (emphasis is still mine): 这解释了对队列大小的影响(重点仍然是我的):

By default, the queue is unbounded, but this is rarely the desired configuration, because it can lead to OutOfMemoryErrors if enough tasks are added to that queue while all pool threads are busy . 默认情况下,队列是无限制的,但这很少是所需的配置,因为如果在所有池线程忙的情况下将足够的任务添加到该队列,则可能导致OutOfMemoryErrors Furthermore, if the queue is unbounded, the max size has no effect at all. 此外,如果队列是无界的,则最大大小根本没有影响。 Since the executor always tries the queue before creating a new thread beyond the core size, a queue must have a finite capacity for the thread pool to grow beyond the core size (this is why a fixed-size pool is the only sensible case when using an unbounded queue). 由于执行程序总是在创建超出核心大小的新线程之前尝试队列,因此队列必须具有有限的容量,以使线程池增长超出核心大小(这就是为什么固定大小的池是使用时唯一合理的情况一个无限的队列)。

Long story short : you didn't set the queue size that by default is unbounded ( Integer.MAX_VALUE ). 简而言之:您没有设置默认情况下无限制的队列大小( Integer.MAX_VALUE )。 So you fill the queue with several hundreds of tasks that will be pop only much later. 因此,您将为数组填充数百个任务,这些任务将在稍后才会弹出。 These tasks use much memory, whereas the OOME risen. 这些任务使用了大量内存,而OOME上升了。

Besides, as explained in the documentation, this setting is helpless with an unbounded queue because only when the queue is full a new thread would be created : 此外,正如文档中所解释的那样,这个设置对于无界队列是无能为力的,因为只有当队列满了时才会创建一个新线程:

executor.setMaxPoolSize(100);

Setting both information with relevant values make more sense : 使用相关值设置两个信息更有意义:

public Executor proxyTaskExecutor() {
    ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
    executor.setCorePoolSize(Runtime.getRuntime().availableProcessors() * 2 - 1);
    executor.setMaxPoolSize(100);
    executor.setQueueCapacity(100); 
    executor.setDaemon(true);
    return executor;
}

Or as alternative use a fixed-size pool with the same value for initial and max pool size : 或者作为替代方法,使用具有相同初始和最大池大小值的固定大小的池

Rather than only a single size, an executor's thread pool can have different values for the core and the max size. 执行程序的线程池可以具有不同的核心值和最大大小,而不仅仅是单个大小。 If you provide a single value, the executor has a fixed-size thread pool (the core and max sizes are the same). 如果提供单个值,则执行程序具有固定大小的线程池(核心和最大大小相同)。

public Executor proxyTaskExecutor() {
    ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
    executor.setCorePoolSize(100);
    executor.setMaxPoolSize(100);
    executor.setDaemon(true);
    return executor;
}

Note also that invoking 1000 times the asynch service without pause seems harmful in terms of memory since it cannot handle them straightly. 另请注意,在没有暂停的情况下调用1000次异步服务似乎对内存有害,因为它无法直接处理它们。 You should probably split these invocations into smaller parts (2, 3 or more) by performing thread.sleep() between them. 您应该通过在它们之间执行thread.sleep()将这些调用拆分为更小的部分(2,3或更多)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM