Jenkins hosted in EC2 losing connection with EC2 workers

Question

My Jenkins instance which is located on the EC2 machine (t3.medium) in a private VPC.network, served by the Nginx is losing connection during long builds. The workers are the same type - EC2 machines in the same region/su.net, and the same JAVA version.

Jenkins version: Jenkins 2.319.3
Java: openjdk version "1.8.0_312"
OS: Ubuntu 20.02

Connection is realized by the SSH connection.

What I tried to resolve this issue:

I changed the EC2 type. Due to the fact of not having enough memory, I changed the type, the issue still exists.
Update JAVA version - I upgraded the JAVA to Java 11. Without any effect.
I changed the agent/worker SSHD configuration: (added ClientAliveInterval 80)
I increased the Connection Timeout in Seconds in the worker configuration (60 -> 6000)
I used the option to connect the worker to Jenkins master by command. The connection was still losing.

I configured more aggressive TCPKeepAlive parameters:

 sysctl -w.net.ipv4.tcp_keepalive_time=120 sysctl -w.net.ipv4.tcp_keepalive_intvl=30 sysctl -w.net.ipv4.tcp_keepalive_probes=8 sysctl -w.net.ipv4.tcp_fin_timeout=30

I added hudson.slaves.ChannelPinger.pingIntervalSeconds=-1 to the JAVA options

Any ideas what can be wrong here?

Error:

04:01:35 FATAL: command execution failed
04:01:36 java.io.EOFException
04:01:36    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2799)
04:01:36    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3274)
04:01:36    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:934)
04:01:36    at java.io.ObjectInputStream.<init>(ObjectInputStream.java:396)
04:01:36    at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
04:01:36    at hudson.remoting.Command.readFrom(Command.java:142)
04:01:36    at hudson.remoting.Command.readFrom(Command.java:128)
04:01:36    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
04:01:36    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
04:01:36 Caused: java.io.IOException: Unexpected termination of the channel
04:01:36    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)

References:

Nginx conf:

upstream jenkins {
  server 127.0.0.1:8080;
}

server {

    listen 443 ssl;
    server_name XXX.CCC.net;

    ssl_certificate           /etc/nginx/valid_cert/XXX.pem;
    ssl_certificate_key       /etc/nginx/valid_cert/XXX.CCC.net.key;
 
    ssl_protocols  TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers HIGH:!aNULL:!eNULL:!EXPORT:!CAMELLIA:!DES:!MD5:!PSK:!RC4;
    ssl_prefer_server_ciphers on;

    access_log            /var/log/nginx/jenkins.access.log;

    ssl_session_cache shared:SSL:10m;
    ssl_stapling on;
    ssl_stapling_verify on;

    location / {
      try_files $uri @app;
    }


    location @app {
      proxy_set_header Host $host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_next_upstream error;
      proxy_pass http://jenkins;
      proxy_redirect http:// https://;
      proxy_read_timeout 150;
    }
  }

Answer 1

I changed the EC2 type. Due to the fact of not having enough memory, I changed the type, the issue still exists.
Update JAVA version - I upgraded the JAVA to Java 11. Without any effect.
I changed the agent/worker SSHD configuration: (added ClientAliveInterval 80)
I increased the Connection Timeout in Seconds in the worker configuration (60 -> 6000)
I used the option to connect the worker to Jenkins master by command. The connection was still losing.

I configured more aggressive TCPKeepAlive parameters:

 sysctl -w.net.ipv4.tcp_keepalive_time=120 sysctl -w.net.ipv4.tcp_keepalive_intvl=30 sysctl -w.net.ipv4.tcp_keepalive_probes=8 sysctl -w.net.ipv4.tcp_fin_timeout=30

I added hudson.slaves.ChannelPinger.pingIntervalSeconds=-1 to the JAVA options

Jenkins hosted in EC2 losing connection with EC2 workers

Question

1 answers

solution1
0 ACCPTED 2023-01-09 01:10:09

Jenkins hosted in EC2 losing connection with EC2 workers

Question

1 answers

solution1 0 ACCPTED 2023-01-09 01:10:09

solution1
0 ACCPTED 2023-01-09 01:10:09