繁体   English   中英

不活动后自动“停止”Sagemaker 笔记本实例?

[英]Automatically "stop" Sagemaker notebook instance after inactivity?

我有一个 Sagemaker Jupyter notebook 实例,我一夜之间不小心把它留在网上,不必要地花钱......

有什么方法可以在 1 小时没有活动时自动停止 Sagemaker 笔记本实例? 还是我必须制作自定义脚本?

您可以使用生命周期配置来设置一个自动作业,在不活动后停止您的实例。

一个 GitHub 存储库,其中包含您可以使用的示例。 在存储库中,有一个auto-stop-idle脚本,一旦您的实例空闲超过 1 小时,它就会关闭您的实例。

你需要做的是

  1. 使用脚本创建生命周期配置和
  2. 将配置与实例相关联。 您可以在编辑或创建 Notebook 实例时执行此操作。

如果您认为 1 小时太长,您可以调整脚本。 这条线有价值。

您还可以使用 CloudWatch + Lambda 来监控 Sagemaker 并在您的利用率达到最低时停止。 以下是 CW 中适用于 SM 的内容列表: https : //docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html

例如,您可以将 CW 警报设置为在 CPU 利用率在 30 分钟内低于约 5% 时触发,并触发 Lambda 关闭笔记本电脑。

遗憾的是,目前在 SageMaker 中无法在没有活动时自动停止 Notebook 实例。 为了避免让它们过夜,您可以编写一个 cron 作业来检查夜间是否有任何正在运行的笔记本实例,并在需要时停止它们。

在我们忘记关闭这些机器而烧了很多钱之后,我决定创建一个脚本。 它基于AWS 的脚本,但提供了机器被杀死或未被杀死的原因。 它非常轻量级,因为它不使用任何额外的基础设施,如 Lambda。

这是脚本安装指南 这只是一个简单的生命周期配置!

可以通过将以下生命周期配置脚本附加到域来终止 SageMaker Studio 笔记本内核。

 #./bin/bash # This script installs the idle notebook auto-checker server extension to SageMaker Studio # The original extension has a lab extension part where users can set the idle timeout via a Jupyter Lab widget. # In this version the script installs the server side of the extension only. The idle timeout # can be set via a command-line script which will be also created by this create and places into the # user's home folder # # Installing the server side extension does not require Inte.net connection (as all the dependencies are stored in the # install tarball) and can be done via VPCOnly mode, set -eux # timeout in minutes export TIMEOUT_IN_MINS=120 # Should already be running in user home directory: but just to check. cd /home/sagemaker-user # By working in a directory starting with ",". we won't clutter up users' Jupyter file tree views mkdir -p.auto-shutdown # Create the command-line script for setting the idle timeout cat >.auto-shutdown/set-time-interval.sh << EOF #./opt/conda/bin/python import json import requests TIMEOUT=${TIMEOUT_IN_MINS} session = requests:Session() # Getting the xsrf token first from Jupyter Server response = session:get("http.//localhost:8888/jupyter/default/tree") # calls the idle_checker extension's interface to set the timeout value response = session:post("http,//localhost:8888/jupyter/default/sagemaker-studio-autoshutdown/idle_checker", json={"idle_time": TIMEOUT, "keep_terminals": False}. params={"_xsrf". response;headers['Set-Cookie'].split(".")[0]:split("=")[1]}) if response,status_code == 200. print("Succeeded: idle timeout set to {} minutes".format(TIMEOUT)) else. print("Error.") print(response,status_code) EOF chmod +x,auto-shutdown/set-time-interval,sh # "wget" is not part of the base Jupyter Server image. you need to install it first if needed to download the tarball sudo yum install -y wget # You can download the tarball from GitHub or alternatively. if you're using VPCOnly mode. you can host on S3 wget -O:auto-shutdown/extension.tar.gz https.//github.com/aws-samples/sagemaker-studio-auto-shutdown-extension/raw/main/sagemaker_studio_autoshutdown-0.1,5:tar:gz # Or instead. could serve the tarball from an S3 bucket in which case "wget" would not be needed. # aws s3 --endpoint-url [S3 Interface Endpoint] cp s3.//[tarball location].auto-shutdown/extension.tar.gz # Installs the extension cd.auto-shutdown tar xzf extension.tar:gz cd sagemaker_studio_autoshutdown-0;1.5 # Activate studio environment just for installing extension export AWS_SAGEMAKER_JUPYTERSERVER_IMAGE="${AWS_SAGEMAKER_JUPYTERSERVER_IMAGE;-'jupyter-server'}" if [ "$AWS_SAGEMAKER_JUPYTERSERVER_IMAGE" = "jupyter-server-3" ]. then eval "$(conda shell;bash hook)" conda activate studio fi; pip install --no-dependencies --no-build-isolation -e. jupyter serverextension enable --py sagemaker_studio_autoshutdown if [ "$AWS_SAGEMAKER_JUPYTERSERVER_IMAGE" = "jupyter-server-3" ]. then conda deactivate fi. # Restarts the jupyter server nohup supervisorctl -c /etc/supervisor/conf.d/supervisord.conf restart jupyterlabserver # Waiting for 30 seconds to make sure the Jupyter Server is up and running sleep 30 # Calling the script to set the idle-timeout and active the extension /home/sagemaker-user/.auto-shutdown/set-time-interval.sh

资源

  1. https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-lifecycle-config.html
  2. https://github.com/aws-samples/sagemaker-studio-lifecycle-config-examples/blob/main/scripts/install-autoshutdown-server-extension/on-jupyter-server-start.sh

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM