簡體   English   中英

從 k8s 集群中作為作業運行時,Chaostoolkit 實驗失敗

[英]Chaostoolkit experiment failing when run as a Job from within a k8s cluster

我正在使用 chaostoolkit,並且能夠從命令行成功運行混沌實驗。 但是,當我嘗試在 k8s 中運行與作業相同的作業時,它會拋出“連接被拒絕”錯誤。 我覺得奇怪的是,有時穩定的 state 假設步驟成功運行並在終止 pod 操作失敗時返回 200 OK,但很多時候它也在假設步驟本身失敗(在終止 pod 的操作之前) . 順便說一句,我正在谷歌雲中這樣做。

在某些運行期間,我看到動作之前的假設以及終止 pod 是成功的,但是動作之后的假設(終止)得到“連接被拒絕”錯誤。

任何幫助/提示表示贊賞。

這是錯誤消息:

[2022-02-03 07:24:54 DEBUG] [caching:35] Cached 2 activities
[2022-02-03 07:24:54 INFO] [experiment:54] Validating the experiment's syntax
[2022-02-03 07:24:54 DEBUG] [configuration:47] Loading configuration...
[2022-02-03 07:24:54 DEBUG] [secret:74] Loading secrets...
[2022-02-03 07:24:54 DEBUG] [secret:89] Secrets loaded
[2022-02-03 07:25:12 INFO] [experiment:103] Experiment looks valid
[2022-02-03 07:25:12 DEBUG] [caching:42] Clearing activities cache
[2022-02-03 07:25:12 DEBUG] [caching:25] Building activity cache...
[2022-02-03 07:25:12 DEBUG] [caching:35] Cached 2 activities
[2022-02-03 07:25:12 INFO] [experiment:182] Running experiment: What happens if we terminate an instance of the application?
[2022-02-03 07:25:12 DEBUG] [configuration:47] Loading configuration...
[2022-02-03 07:25:12 DEBUG] [secret:74] Loading secrets...
[2022-02-03 07:25:12 DEBUG] [secret:89] Secrets loaded
[2022-02-03 07:25:12 DEBUG] [__init__:39] Initializing controls
[2022-02-03 07:25:12 DEBUG] [__init__:355] No controls to apply on 'experiment'
[2022-02-03 07:25:12 INFO] [hypothesis:184] Steady state hypothesis: The app is healthy
[2022-02-03 07:25:12 DEBUG] [__init__:355] No controls to apply on 'hypothesis'
[2022-02-03 07:25:12 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 07:25:12 INFO] [activity:160] Probe: app-responds-to-requests
[2022-02-03 07:25:12 DEBUG] [activity:233] Activity failed
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 156, in _new_conn
        conn = connection.create_connection(
      File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 84, in create_connection
        raise err
      File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 74, in create_connection
        sock.connect(sa)
    ConnectionRefusedError: [Errno 111] Connection refused

這是我提供給工作的配置:

health-http.yaml: |
    version: 1.0.0
    title: What happens if we terminate an instance of the application?
    description: If an instance of the application is terminated, the applications as a whole should still be operational.
    tags:
    - k8s
    - pod
    steady-state-hypothesis:
      title: The app is healthy
      probes:
      - name: app-responds-to-requests
        type: probe
        tolerance: 200
        provider:
          type: http
          timeout: 10
          verify_tls: false
          url: http://newapp
          headers:
            Host: newapp.example.com
    method:
    - type: action
      name: terminate-app-pod
      provider:
        type: python
        module: chaosk8s.pod.actions
        func: terminate_pods
        arguments:
          label_selector: app=newapp
          rand: true
          ns: default
      pauses: 
        after: 2

我能夠將 ssh 放入一個虛擬的 nginx pod 和“curl newapp”,它會返回正確的響應,因此該服務肯定是活躍的並且可以工作。 除了其他權限外,我還創建了具有獲取、列出、刪除 Pod 的權限的服務帳戶。

這是實驗清單:

apiVersion: batch/v1
kind: Job
metadata:
  name: newapp-chaos
spec:
  activeDeadlineSeconds: 600
  backoffLimit: 0
  template:
    metadata:
      labels:
        app: newapp
      annotations:
        sidecar.istio.io/inject: "false"
    spec:
      serviceAccountName: newapp-chaos
      restartPolicy: Never
      containers:
      - name: chaostoolkit
        image: vfarcic/chaostoolkit:1.4.1-2
        args:
        - --verbose
        - run
        - /experiment/health-http.yaml
        env:
        - name: CHAOSTOOLKIT_IN_POD
          value: "true"
        volumeMounts:
        - name: config
          mountPath: /experiment
          readOnly: true
        resources:
          limits:
            cpu: 20m
            memory: 64Mi
          requests:
            cpu: 20m
            memory: 64Mi
      volumes:
      - name: config
        configMap:
          name: newapp-config

這是我的應用清單:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: newapp-v2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: newapp
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: newapp
        version: v2
    spec:
      containers:
      - image: rstarmer/hostname:v2
        imagePullPolicy: Always
        name: newapp
      restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: newapp
  name: newapp
spec:
  #externalTrafficPolicy: Cluster
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: newapp
  sessionAffinity: None

這是 output 終止也很好但后來遇到錯誤:

[2022-02-03 09:43:22 INFO] [hypothesis:184] Steady state hypothesis: The app is healthy
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'hypothesis'
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:22 INFO] [activity:160] Probe: app-responds-to-requests
[2022-02-03 09:43:22 DEBUG] [activity:179]   => succeeded with '{'status': 200, 'headers': {'Server': 'nginx/1.15.4', 'Date': 'Thu, 03 Feb 2022 09:43:22 GMT', 'Content-Type': 'text/html', 'Content-Length': '208', 'Last-Modified': 'Thu, 03 Feb 2022 07:21:47 GMT', 'Connection': 'keep-alive', 'ETag': '"61fb828b-d0"', 'Accept-Ranges': 'bytes'}, 'body': "<HTML>\n<HEAD>\n<TITLE>This page is on newapp-v2-866f8798cd-8s424 and is version v2</TITLE>\n</HEAD><BODY>\n<H1>THIS IS HOST newapp-v2-866f8798cd-8s424</H1>\n<H2>And we're running version: v2</H2>\n</BODY>\n</HTML>\n"}'
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:22 DEBUG] [hypothesis:212] allowed tolerance is 200
[2022-02-03 09:43:22 INFO] [hypothesis:222] Steady state hypothesis is met!
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'hypothesis'
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'method'
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:22 INFO] [activity:160] Action: terminate-app-pod
[2022-02-03 09:43:22 DEBUG] [python:34] Activity 'terminate-app-pod' loaded from '/usr/local/lib/python3.8/site-packages/chaosk8s/pod/actions.py'
[2022-02-03 09:43:23 DEBUG] [actions:193] Found 3 pods labelled 'app=newapp' in ns default
[2022-02-03 09:43:23 DEBUG] [activity:181]   => succeeded without any result value
[2022-02-03 09:43:23 INFO] [activity:197] Pausing after activity for 2s...
[2022-02-03 09:43:25 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:25 DEBUG] [__init__:355] No controls to apply on 'method'
[2022-02-03 09:43:25 INFO] [hypothesis:184] Steady state hypothesis: The app is healthy
[2022-02-03 09:43:25 DEBUG] [__init__:355] No controls to apply on 'hypothesis'
[2022-02-03 09:43:25 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:25 INFO] [activity:160] Probe: app-responds-to-requests
[2022-02-03 09:43:25 DEBUG] [activity:233] Activity failed
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 156, in _new_conn
        conn = connection.create_connection(
      File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 84, in create_connection
        raise err
      File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 74, in create_connection
        sock.connect(sa)
    ConnectionRefusedError: [Errno 111] Connection refused

經過一天多的努力,我嘗試重新安裝 Istio,它開始正常工作。 那么 Istio 一定有問題。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM