簡體   English   中英

當 "OOMKilled": false 時,對 137 上的 docker Exit 進行故障排除

[英]Troubleshoot docker Exit on 137 when "OOMKilled": false

我做了什么

  1. 使用 docker docker-compose up在 AlmaLinux 服務器上啟動服務
  2. 注意到docker-compose logs輸出暫時沒有變化
  3. 檢查docker-compose ps
$ docker-compose ps
              Name                            Command                State     Ports
------------------------------------------------------------------------------------
mysupercoolsystem_api_1           python -m mysupercoolsyste ...   Exit 137
mysupercoolsystem_dev_1           sh -c jupyter lab --ip=0.0 ...   Exit 137
mysupercoolsystem_loader_1        /bin/sh -c python -m mysup ...   Exit 137
mysupercoolsystem_predictor_1     /bin/sh -c python -m mysup ...   Exit 137
mysupercoolsystem_trainer_1       /bin/sh -c python -m mysup ...   Exit 137


$ docker ps -a  # just to confirm
72708f3450   hub.nic.dk/nicecompany/mysupercoolsystem   "/bin/sh -c 'python …"   2 days ago    Exited (137) 2 days ago              mysupercoolsystem_trainer_1
3e286cabb0   jupyter/scipy-notebook:33add21fab64        "sh -c 'jupyter lab …"   2 days ago    Exited (137) 2 days ago              mysupercoolsystem_dev_1
246b87f0ac   hub.nic.dk/nicecompany/mysupercoolsystem   "/bin/sh -c 'python …"   2 days ago    Exited (137) 2 days ago              mysupercoolsystem_predictor_1
7d3297092c   hub.nic.dk/nicecompany/mysupercoolsystem   "python -m mysuperc …"   2 days ago    Exited (137) 2 days ago              mysupercoolsystem_api_1
2a07851f9c   hub.nic.dk/nicecompany/mysupercoolsystem   "/bin/sh -c 'python …"   2 days ago    Exited (137) 2 days ago              mysupercoolsystem_loader_1

  1. 研究容器是否因內存不足而停止
    • 與服務器管理員交談:服務器上未超出 RAM 限制
    • docker info | grep Memory docker info | grep Memory返回Total Memory: 19.37GiB
    • docker inspect <container_id>檢查每個容器給出相同的"State" ,除了字段"FinishedAt"±0.05秒變化。 跟進:與系統管理員交談:服務器沒有重新啟動或明確要求終止任何進程。
"State": {
  "Status": "exited",
  "Running": false,
  "Paused": false,
  "Restarting": false,
  "OOMKilled": false,
  "Dead": false,
  "Pid": 0,
  "ExitCode": 137,
  "Error": "",
  "StartedAt": "2021-11-13T10:33:04.785566471Z",
  "FinishedAt": "2021-11-13T10:33:57.1xxxxZ"
  1. 重新檢查了我docker-compose.yml
version: "3"
services:
  dev:
    image: jupyter/scipy-notebook:33add21fab64
    environment:
      - COMPONENT=develop
    volumes:
      - /opt/mysupercoolsystem:/home/jovyan/work
      - /media:/media
    ports:
      - "3333:3333"
    entrypoint: sh -c "jupyter lab --ip=0.0.0.0 --port=3333 --no-browser --allow-root"

  loader:
    image: hub.nic.com/nicecompany/mysupercoolsystem
    working_dir: "/app"
    volumes:
      - /media:/media

  trainer:
    image: hub.nic.dk/nicecompany/mysupercoolsystem
    environment:
      - COMPONENT=train
    working_dir: "/app"
    volumes:
      - models:/models

  predictor:
    image: hub.nic.dk/nicecompany/mysupercoolsystem
    environment:
      - COMPONENT=pred
    working_dir: "/app"
    volumes:
      - models:/models

  api:
    image: hub.nic.dk/nicecompany/mysupercoolsystem
    environment:
      - COMPONENT=api
    working_dir: "/app"
    ports:
      - "69:69"
    entrypoint: python -m mysupercoolsystem.web_api

volumes:
  models:
  1. 檢查類似問題(這個問題有--abort-on-container-exit -flag 作為罪魁禍首。我沒有使用任何標志)。

如何進行

  • 為什么要退出服務?
  • 我該怎么做才能解決錯誤?
  • 我應該檢查其他日志嗎?
  • 如果我在每個服務上添加restart: unless-stopped ,除了我自己通過docker logs記錄之外,還有什么方法可以檢查 docker 服務退出嗎?

您可以使用https://pythonspeed.com/fil/調試 Python 中的內存不足錯誤(請參閱https://pythonspeed.com/articles/crash-out-of-memory/ )。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM