简体   繁体   English

jenkins inbound-agent 无法到达 kubernetes 内的主服务器

[英]jenkins inbound-agent cant reach master within kubernetes

Last week I was experimenting with Jenkins setting up it via the helm chart with kubernetes ephemeral agents and I got it working.上周我正在尝试使用 Jenkins 通过 helm chart 使用 kubernetes 临时代理进行设置,然后我让它工作了。 Then this weekend I did something wrong(not sure what) and agents are not able to come up.然后这个周末我做错了(不知道是什么)并且代理无法出现。 when triggering the sample hello world pipeline the agents will try to connect but they just keep bouncing in the cluster.当触发示例 hello world 管道时,代理将尝试连接,但它们只是在集群中不断弹跳。 So i uninstalled jenkins and set it up again and am still having the same issue.所以我卸载了 jenkins 并再次设置它,但仍然遇到同样的问题。

Details:细节:

  • Cluster - k3s (v1.19.4+k3s1)集群 - k3s (v1.19.4+k3s1)
  • networking Flannel联网法兰绒
  • Jenkins (2.263.1) installed via helm (with agents in jenkins-agents namespace, jenkins in jenkins namespace) Jenkins (2.263.1) 通过 helm 安装(代理在 jenkins-agents 命名空间中,jenkins 在 jenkins 命名空间中)

The jenkins master logs show this again and again as the master tries to provision the agent. jenkins 主日志在主尝试配置代理时一次又一次地显示这一点。

Jan 04, 2021 4:35:34 AM WARNING org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
Error in provisioning; agent=KubernetesSlave name: default-4khvl, template=PodTemplate{id='3816c387-4b94-482d-bdc9-87901b3d402a', name='default', label='jenkins-jenkins-agent', serviceAccount='default', nodeUsageMode=NORMAL, podRetention='Never', containers=[ContainerTemplate{name='jnlp', image='*************/archive/jenkins/inbound-agent:4.6-1-alpine', workingDir='/home/jenkins', args='${computer.jnlpmac} ${computer.name}', resourceRequestCpu='2', resourceRequestMemory='4Gi', resourceLimitCpu='2', resourceLimitMemory='4Gi', envVars=[KeyValueEnvVar [getValue()=http://jenkins.jenkins.svc.cluster.local:8080/jenkins, getKey()=JENKINS_URL]]}]}
Also:   java.lang.Throwable: launched here
    at hudson.slaves.SlaveComputer._connect(SlaveComputer.java:283)
    at hudson.model.Computer.connect(Computer.java:435)
    at hudson.slaves.CloudRetentionStrategy.start(CloudRetentionStrategy.java:73)
    at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.start(OnceRetentionStrategy.java:83)
    at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.start(OnceRetentionStrategy.java:46)
    at hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:162)
    at hudson.model.AbstractCIBase.access$000(AbstractCIBase.java:44)
    at hudson.model.AbstractCIBase$2.run(AbstractCIBase.java:224)
    at hudson.model.Queue._withLock(Queue.java:1398)
    at hudson.model.Queue.withLock(Queue.java:1275)
    at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:207)
    at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1634)
    at jenkins.model.Nodes$2.run(Nodes.java:139)
    at hudson.model.Queue._withLock(Queue.java:1398)
    at hudson.model.Queue.withLock(Queue.java:1275)
    at jenkins.model.Nodes.addNode(Nodes.java:135)
    at jenkins.model.Jenkins.addNode(Jenkins.java:2157)
    at hudson.slaves.NodeProvisioner.lambda$update$6(NodeProvisioner.java:256)
    at hudson.model.Queue._withLock(Queue.java:1398)
    at hudson.model.Queue.withLock(Queue.java:1275)
    at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:225)
    at hudson.slaves.NodeProvisioner.access$900(NodeProvisioner.java:64)
    at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:821)
    at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:91)
    at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
java.lang.IllegalStateException: Agent is not connected after 31 seconds, status: Failed
    at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:233)
    at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:294)
    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
    at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

the pod logs show this when i retain the pods after they error out当我在它们出错后保留 pod 时,pod 日志会显示这一点

➜  kubernetes-jenkins git:(master) ✗ kubectl logs -n jenkins-agent pod/default-nd2k0
default-nd2k0: line 1: 18a950820798693f38009beef2323ecaf4acabcff0d0e5603bce62f8417d3e6c: not found

I have also tried to bring up a permanent agent after setting up the agent on the master and bringing up the pod but i've had no success there.在主服务器上设置代理并启动 pod 后,我还尝试建立一个永久代理,但我在那里没有成功。

permanent agent yaml永久代理 yaml

---
apiVersion: "v1"
kind: "Pod"
metadata:
  annotations:
    app: "worker-agent"
  labels:
    worker: "worker-agent"
  name: "kube-1"
  namespace: "jenkins-agent"
spec:
  containers:
  - env:
    - name: "JENKINS_SECRET"
      value: "83a734ff2152633ed7f7ca0150b3fa28c2cbe370ca91c4f7ca513379613fb7bd"
    - name: "JENKINS_TUNNEL"
      value: "jenkins-agent.svc.cluster.local:50000"
    - name: "JENKINS_AGENT_NAME"
      value: "kube-1"
    - name: "JENKINS_AGENT_WORKDIR"
      value: "/home/jenkins/agent"
    - name: "JENKINS_URL"
      value: "http://jenkins.jenkins.svc.cluster.local:8080/jenkins"
    image: "jenkins/inbound-agent:4.6-1-alpine"
    imagePullPolicy: "Always"
    name: "jnlp"
    resources:
      limits:
        cpu: "2000m"
        memory: "2048Mi"
      requests:
        cpu: "500m"
        memory: "1024Mi"
    volumeMounts:
    - mountPath: "/home/jenkins/agent"
      name: "workspace-volume"
      readOnly: false
  nodeSelector:
    kubernetes.io/os: "linux"
  restartPolicy: "Never"
  volumes:
  - emptyDir:
      medium: ""
    name: "workspace-volume"

logs of the permanent agent永久代理的日志

➜  kubernetes-jenkins git:(master) ✗ kubectl logs -n jenkins-agent pod/kube-1
Jan 04, 2021 4:48:54 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: kube-1
Jan 04, 2021 4:48:54 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Jan 04, 2021 4:48:54 AM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.6
Jan 04, 2021 4:48:54 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/agent/remoting as a remoting work directory
Jan 04, 2021 4:48:54 AM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
Jan 04, 2021 4:48:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins.jenkins.svc.cluster.local:8080/jenkins]
Jan 04, 2021 4:49:25 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/: connect timed out
java.io.IOException: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/: connect timed out
    at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
    at hudson.remoting.Engine.innerRun(Engine.java:689)
    at hudson.remoting.Engine.run(Engine.java:514)
Caused by: java.net.SocketTimeoutException: connect timed out
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:607)
    at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
    at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
    at sun.net.www.http.HttpClient.New(HttpClient.java:339)
    at sun.net.www.http.HttpClient.New(HttpClient.java:357)
    at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
    at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
    at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:211)
    ... 2 more

all services and pods are up and dns seems to work所有服务和 pod 都已启动,并且 dns 似乎可以工作

➜  kubernetes-jenkins git:(master) ✗ kubectl exec -ti -n jenkins-agent dnsutils -- nslookup jenkins.jenkins.svc.cluster.local
Server:     100.100.64.10
Address:    100.100.64.10#53

Name:   jenkins.jenkins.svc.cluster.local
Address: 100.100.106.175

➜  kubernetes-jenkins git:(master) ✗ kubectl exec -ti -n jenkins-agent dnsutils -- nslookup jenkins-agent.jenkins.svc.cluster.local
Server:     100.100.64.10
Address:    100.100.64.10#53

Name:   jenkins-agent.jenkins.svc.cluster.local
Address: 100.100.77.168

➜  kubernetes-jenkins git:(master) ✗ kubectl get all -n jenkins
NAME            READY   STATUS    RESTARTS   AGE
pod/jenkins-0   2/2     Running   0          84m

NAME                    TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)     AGE
service/jenkins         ClusterIP   100.100.106.175   <none>        8080/TCP    84m
service/jenkins-agent   ClusterIP   100.100.77.168    <none>        50000/TCP   84m

NAME                       READY   AGE
statefulset.apps/jenkins   1/1     84m

any ideas anyone has i should checkout would be greatly appreciated.任何人有我应该结帐的任何想法将不胜感激。

I will try to dial my jenkins helm chart to the bare minimum to get it working again and keep this posting up to date with my trials and errors.我将尝试将我的 jenkins 舵图拨到最低限度,以使其再次工作,并通过我的试验和错误使此帖子保持最新。

I ran a util pod and saw i could curl other pod services on the cluster just not jenkins.我运行了一个 util pod,发现我可以 curl 集群上的其他 pod 服务,而不是 jenkins。 And then i brought up jenkins on a pet cluster and saw the curl command worked there.然后我在宠物集群上提出了 jenkins 并看到 curl 命令在那里工作。

so after restarting each node in my cluster the curl command worked...not sure what the issue was I wish I did.因此,在重新启动集群中的每个节点后,curl 命令起作用了……不知道我希望我遇到了什么问题。 I was then able to launch the agents successfully.然后我能够成功启动代理。

pod command and output:吊舱命令和 output:

➜  kubernetes-jenkins git:(master) ✗ kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -n jenkins -- /bin/bash
If you don't see a command prompt, try pressing enter.
bash-5.0# curl http://jenkins.jenkins:8080/jenkins/tcpSlaveAgentListener/


  Jenkins

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM