简体   繁体   English

为什么几分钟后 Spring 启动应用程序和 Consul 之间的 SSL 连接失败?

[英]Why is the SSL connection between a Spring Boot app and Consul failing after a few minutes?

I'm in the process of upgrading an environment with new versions of Ubuntu, Consul and Spring Boot.我正在使用新版本的 Ubuntu、Consul 和 Spring Boot 升级环境。 At first glance, everything seems to be working just fine.乍一看,一切似乎都运行良好。 The app connects to Consul, requests its configuration and boots up.该应用程序连接到 Consul,请求其配置并启动。 After a few minutes however, something breaks and this message is repeated approximately every 2 seconds:然而,几分钟后,有些东西中断了,这条消息大约每 2 秒重复一次:

com.ecwid.consul.transport.TransportException: javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
    at com.ecwid.consul.transport.AbstractHttpTransport.executeRequest(AbstractHttpTransport.java:77)
    at com.ecwid.consul.transport.AbstractHttpTransport.makeGetRequest(AbstractHttpTransport.java:34)
    at com.ecwid.consul.v1.ConsulRawClient.makeGetRequest(ConsulRawClient.java:128)
    at com.ecwid.consul.v1.catalog.CatalogConsulClient.getCatalogServices(CatalogConsulClient.java:120)
    at com.ecwid.consul.v1.ConsulClient.getCatalogServices(ConsulClient.java:372)
    at org.springframework.cloud.consul.discovery.ConsulCatalogWatch.catalogServicesWatch(ConsulCatalogWatch.java:129)
    at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
    at java.base/sun.security.ssl.SSLSocketImpl.handleEOF(SSLSocketImpl.java:1313)
    at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1152)
    at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1055)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:395)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:394)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:353)
    at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
    at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)

I traced the fact that the message appears every 2 seconds to the apps health-check.我追踪到消息每 2 秒出现一次到应用程序运行状况检查的事实。 Once the error occurs for the first time, it continues to occur on each subsequent health-check.一旦错误第一次发生,它会在每次后续的运行状况检查中继续发生。 This was confirmed by turning the health-check off and rebooting.通过关闭运行状况检查并重新启动来确认这一点。 This caused the error to occur exactly once, when the TTL on the Consul-data was reached.当达到 Consul-data 上的 TTL 时,这会导致错误仅发生一次。

This is where my understanding of the problem ends.这就是我对问题的理解结束的地方。 I've tried to trace it to several things, but none of these led to a solution:我试图将其追溯到几件事,但这些都没有导致解决方案:

  • Checking the certificates - The certificates used are generated by Vault, signed by an intermediate certificate, signed by a self-signed root-certificate.检查证书- 使用的证书由 Vault 生成,由中间证书签名,由自签名根证书签名。 These are then placed in a pkcs12 bundle, with the key and provided to the application.然后将它们与密钥一起放入 pkcs12 包中并提供给应用程序。 This works on all TLS-connections, also from the CLI tools and with curl .这适用于所有 TLS 连接,也适用于 CLI 工具和curl This seems like a dead end.这似乎是一个死胡同。
  • Network Connectivity - Since the connection is being reset, I tried to see if it was due to firewall or security-group issues.网络连接- 由于正在重置连接,我试图查看它是否是由于防火墙或安全组问题。 However, the relevant port (8501) is open for both TCP and UDP traffic and all manual tests with nc show the ports to be reachable.但是,相关端口 (8501) 对 TCP 和 UDP 流量都是开放的,并且所有使用nc的手动测试都显示这些端口是可访问的。
  • IPv6 bug - Somewhere, I found a post saying this could be due to a bug with IPv6. IPv6 错误- 在某处,我发现一个帖子说这可能是由于 IPv6 的错误。 I tried turning IPv6 off on the machine, reboot everything and try again.我尝试在机器上关闭 IPv6,重新启动一切,然后重试。 No luck, still the same error.没有运气,仍然是同样的错误。
  • Versions of Consul - I tried running the application in our old environment, where Consul 1.2.3 is running and the error there does not show up. Consul 的版本- 我尝试在我们的旧环境中运行应用程序,Consul 1.2.3 正在运行并且那里的错误没有出现。 I'm still in the process of trying to find out if there is a specific Consul-version where this problem begins to occur, but haven't found it yet.我仍在尝试找出是否存在开始出现此问题的特定 Consul 版本,但尚未找到。
  • TLS bugs - Between Consul 1.2.3 and 1.7.2 there have been some changes to Consul's TLS-support as well as to the underlying Go TLS implementations. TLS 错误- 在 Consul 1.2.3 和 1.7.2 之间,Consul 的 TLS 支持以及底层 Go TLS 实现发生了一些变化。 This came to light when testing with Consul 1.4.0, which provided a slightly different TLS error.这在使用 Consul 1.4.0 进行测试时暴露出来,它提供了一个略有不同的 TLS 错误。 Some suggestions on the internet were that there are conflicting implementations between Go and OpenJDK.互联网上的一些建议是 Go 和 OpenJDK 之间的实现存在冲突。 I tried forcing the Java-application to use TLS 1.2, but again, no luck.我尝试强制 Java 应用程序使用 TLS 1.2,但同样没有运气。
  • Handshake Debugging - Based on a tip from the comments, I used -Djavax.net.debug=ssl:handshake to find out what is happening during the handshake.握手调试- 根据评论中的提示,我使用-Djavax.net.debug=ssl:handshake来了解握手期间发生的情况。 During the first couple of minutes, the produced extra output shows what looks to me like normal handshakes.在最初的几分钟内,产生的额外 output 在我看来就像正常的握手。 Once the problem occurs, the output of the handshakes stops right after "Produced Client Hello message" with a "Remote host terminated the handshake".一旦出现问题,握手的 output 在“Produced Client Hello message”之后立即停止,并带有“Remote host terminate the handshake”。 I haven't been able to do the same with the other side of this connection.我无法对这个连接的另一端做同样的事情。 Consul is a Golang application. Consul 是一个 Golang 应用程序。 If anyone knows how to get the same debugging-information for a Golan app, please advise.如果有人知道如何为 Golan 应用程序获取相同的调试信息,请告知。

I hope someone has an idea about how to find the cause of this problem, or better yet, have a solution for it.我希望有人知道如何找到这个问题的原因,或者更好的是,有一个解决方案。

After some more digging and trying other versions of things.经过更多的挖掘和尝试其他版本的东西。 I found that using GraalVM produces a different, but slightly more descriptive error.我发现使用 GraalVM 会产生不同但更具描述性的错误。 When trying to connect to the Consul-application, it immediately terminates with this message:当尝试连接到 Consul 应用程序时,它会立即终止并显示以下消息:

Caused by: javax.net.ssl.SSLHandshakeException: extension (5) should not be presented in certificate_request
    at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
    at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:307)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:263)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:254)
    at java.base/sun.security.ssl.SSLExtensions.<init>(SSLExtensions.java:90)
    at java.base/sun.security.ssl.CertificateRequest$T13CertificateRequestMessage.<init>(CertificateRequest.java:818)
    at java.base/sun.security.ssl.CertificateRequest$T13CertificateRequestConsumer.consume(CertificateRequest.java:922)
    at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421)
    at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:177)
    at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:164)
    at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1151)
    at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1062)
    at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:402)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:394)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:353)
    at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
    at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:220)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:164)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:139)
    at com.ecwid.consul.transport.AbstractHttpTransport.executeRequest(AbstractHttpTransport.java:61)

This then led me to an issue on the Golang GitHub-page: https://github.com/golang/go/issues/35722 .这让我在 Golang GitHub 页面上遇到了一个问题: https://github.com/golang/go/issues/35722 It details similar issues from various people, but constantly with slightly different details.它详细介绍了来自不同人的类似问题,但细节不断略有不同。 In that thread, there is mention of a discrepancy between TLS 1.3 implementations between Go and Java.在该线程中,提到了 Go 和 Java 之间的 TLS 1.3 实现之间的差异。 The OpenJDK-maintainers also chip in and refer to this issue: https://bugs.openjdk.java.net/browse/JDK-8236039 . OpenJDK 维护者也参与并参考了这个问题: https://bugs.openjdk.java.net/browse/JDK-8236039

It has been fixed and closed, but is not yet available in any of my regular binary distributions.它已被修复并关闭,但在我的任何常规二进制发行版中尚不可用。 I will try to check whether that version actually fixes the problem.我将尝试检查该版本是否真正解决了问题。 However, there is a workaround by disabling TLS1.2 in Java.但是,有一个解决方法是在 Java 中禁用 TLS1.2。 You can do this by adding -Djdk.tls.client.protocols=TLSv1.2 to the startup arguments.您可以通过将-Djdk.tls.client.protocols=TLSv1.2添加到启动 arguments 来执行此操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM