[英]Highly Concurrent Apache Async HTTP Client IOReactor issues
Application description : 应用说明:
According to the above sections, here's the tunning for my fiber http client ( which of course I'm using a single instance of ): 根据以上各节,这是我的光纤http客户端的调谐(当然,我正在使用的单个实例):
PoolingNHttpClientConnectionManager connectionManager = new PoolingNHttpClientConnectionManager( new DefaultConnectingIOReactor( IOReactorConfig. custom(). setIoThreadCount(16). setSoKeepAlive(false). setSoLinger(0). setSoReuseAddress(false). setSelectInterval(10). build() ) ); connectionManager.setDefaultMaxPerRoute(32768); connectionManager.setMaxTotal(131072); FiberHttpClientBuilder fiberClientBuilder = FiberHttpClientBuilder. create(). setDefaultRequestConfig( RequestConfig. custom(). setSocketTimeout(1500). setConnectTimeout(1000). build() ). setConnectionReuseStrategy(NoConnectionReuseStrategy.INSTANCE). setConnectionManager(connectionManager). build();
ulimits for open-files are set super high ( 131072 for both soft and hard values ) 将打开文件的ulimit设置为超高(软值和硬值都为131072)
kernel.printk = 8 4 1 7 kernel.printk_ratelimit_burst = 10 kernel.printk_ratelimit = 5 net.ipv4.ip_local_port_range = 8192 65535 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.core.rmem_default = 16777216 net.core.wmem_default = 16777216 net.core.optmem_max = 40960 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 net.core.netdev_max_backlog = 100000 net.ipv4.tcp_max_syn_backlog = 100000 net.ipv4.tcp_max_tw_buckets = 2000000 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_fin_timeout = 10 net.ipv4.tcp_slow_start_after_idle = 0 net.ipv4.tcp_sack = 0 net.ipv4.tcp_timestamps = 1 kernel.printk = 8 4 1 7 kernel.printk_ratelimit_burst = 10 kernel.printk_ratelimit = 5 net.ipv4.ip_local_port_range = 8192 65535 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.core.rmem_default = 16777216 net.core .wmem_default = 16777216 net.core.optmem_max = 40960 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 net.core.netdev_max_backlog = 100000 net.ipv4.tcp_max_syn_backets = 20000.net net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_fin_timeout = 10 net.ipv4.tcp_slow_start_after_idle = 0 net.ipv4.tcp_sack = 0 net.ipv4.tcp_timestamps = 1
Problem description 问题描述
Pending
stat clibms to a sky-rocketing 30K pending connection requests as well 在大约25K的租用连接中,实际数据不再通过套接字连接发送, Pending
stat clibms也急剧上升到30K挂起的连接请求 lsof
ing the java process , I can see it has tens of thousands of file descriptors , almost all of them are in CLOSE_WAIT ( which makes sense , as the I/O reactor thread die/stop functioning and never get to actually closing them lsof
荷兰国际集团的java程序,我可以看到它有成千上万的文件描述符,几乎所有的人都在CLOSE_WAIT(这是有道理的,因为I / O反应螺纹模/停止功能和永远不会真正关闭它们 Questions 问题
Forgot to answer this, but I got what's going on roughly a week after posting the question : 忘了回答这个问题,但是发布问题大约一周后,我得到了什么?
There was some sort of miss-configuration that caused the io-reactor to spawn with only 2 threads. 某种错误配置导致io反应器仅产生2个线程。
Even after providing more reactor threads, the issue persisted. 即使在提供更多的反应堆线程之后,问题仍然存在。 It turns out that our outgoing requests were mostly SSL. 事实证明,我们的传出请求主要是SSL。 Apache SSL connection handling propagates the core handling to the JVM's SSL facilities which simply - are not efficient enough for handling thousands of SSL connections requests per second. Apache SSL连接处理将核心处理传播到JVM的SSL设施,这些设施-效率不足以每秒处理数千个SSL连接请求。 Being more specific, some methods inside SSLEngine(If I recall correctly) are synchronized. 更具体地说,SSLEngine内部的一些方法(如果我没记错的话)是同步的。 doing thread-dumps under high loads shows the IORecator threads blocking each-other while trying to open SSL connections. 在高负载下执行线程转储显示IORecator线程在尝试打开SSL连接时互相阻塞。
Even trying to create a pressure release valve in the form of connection lease-timeout didn't work because the backlogs created were to large, rendering the application useless. 甚至尝试以连接租用超时的形式创建泄压阀也不起作用,因为创建的积压订单过多,导致应用程序无用。
Offloading SSL outgoing requests handling to nginx performed even worse - because the remote endpoints are terminating the requests preemptively, SSL client session cache could not be used ( same goes for the JVM implementation ). 将SSL传出请求处理工作卸载到nginx的情况甚至更糟-因为远程端点抢先终止了请求,所以无法使用SSL客户端会话缓存(JVM实现也是如此)。
Wound up putting a semaphore in-front of the entire module, limiting the whole thing to ~6000 at any given moment, which solved the issue. 忍不住将信号灯放在整个模块的前面,在任何给定的时刻将整个事情限制在6000左右,这解决了问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.