简体   繁体   English

如何防止 AWS EC2 服务器无限期运行?

[英]How to prevent AWS EC2 server from running indefinitely?

I have a Django app that users can submit video through to be processed via a python script running OpenCV on a separate EC2 instance.我有一个 Django 应用程序,用户可以通过该应用程序提交视频,以通过在单独的 EC2 实例上运行 OpenCV 的 python 脚本进行处理。 As this is a moderately expensive server to run (p2.Xlarge ~ $3.00/h) it is only spun up when the video is submitted and I want to ensure that it doesn't continue to run if there is some hiccup in the processing.由于这是一台运行成本适中的服务器(p2.Xlarge ~ 3.00 美元/小时),它仅在提交视频时才会启动,我想确保在处理过程中出现问题时它不会继续运行。 If the program works fine the instance is properly shut down.如果程序运行良好,则实例会正确关闭。

The problem is sometimes the python script gets hung up (I can't seem to replicate this on it's own which is a separate problem) when the script doesn't fully execute the server continues to run indefinitely.问题有时是 python 脚本挂起(我似乎无法自行复制它,这是一个单独的问题),当脚本没有完全执行时,服务器继续无限期运行。 I have tried the solution provided here for self terminating an AWS EC2 instance .我已经尝试过这里提供的解决方案,用于自行终止 AWS EC2 实例 The solution works if the server is idle but doesn't seem to work if the server is busy trying to process the video.如果服务器空闲,则该解决方案有效,但如果服务器正忙于尝试处理视频,则该解决方案似乎不起作用。

Is there a better way to make sure the server doesn't run longer than x minutes and stop it, even if the server is in the middle of a process?有没有更好的方法来确保服务器运行时间不超过 x 分钟并停止它,即使服务器处于进程中间?

The code I'm currently using:我目前正在使用的代码:

import paramiko
import boto3
import sys
from botocore.exceptions import ClientError
import json
from time import sleep

import argparse

parser = argparse.ArgumentParser()
   
parser.add_argument('--username', required=False)
parser.add_argument('--date', required=False)


args = parser.parse_args()

uName = args.username
theDate = args.date


ec2 = boto3.client('ec2', region_name= 'us-east-1', aws_access_key_id=accessKey, aws_secret_access_key=secretKey, )
ec2_2 = boto3.resource('ec2', region_name= 'us-east-1', aws_access_key_id=accessKey, aws_secret_access_key=secretKey, )
client = boto3.client('ses',region_name= 'us-east-1', aws_access_key_id=accessKey, aws_secret_access_key=secretKey,)


s3_resource = boto3.client('s3', region_name= 'us-east-1', aws_access_key_id=accessKey, aws_secret_access_key=secretKey, ) 

s3_instance = boto3.resource('s3', region_name= 'us-east-1', aws_access_key_id=accessKey, aws_secret_access_key=secretKey, ) 

obj = s3_instance.Object('my_bucket', 'data/instances.txt')#load file of instances
body=obj.get()['Body'].read().decode('utf-8')


instance_ids.index(body.split()[-1:][0])#get index of last run instance
if instance_ids.index(body.split()[-1:][0]) != 4: #if it isn't the 5th instance run the next instance 
    instance_id=instance_ids[instance_ids.index(body.split()[-1:][0])+1]
else:
    instance_id=instance_ids[0]#if it is the last instance then run the first instance

body+='\n'+instance_id #add the instance run to the end of the file
obj.put(Body=body) #write the file back to S3

while True:
try: 
    ec2.start_instances(InstanceIds=[instance_id], DryRun=True)
except ClientError as e:
    if 'DryRunOperation' not in str(e):
        raise
try:
    ec2.start_instances(InstanceIds=[instance_id], DryRun=False)
    break
except:
    continue
#except 'ClientError' as e:
 #   print(e)
print('instance started')

while True:
    if not ec2_2.Instance(instance_id).state['Code']== 16:
        print(ec2_2.Instance(instance_id).state)
        sleep(2.5)
        continue
    else:
        print('state == running')
        break

while True:
    try:
        instance = ec2_2.Instance(instance_id).public_ip_address
        ip_add=instance
        break
    except:
        continue

prevent_bankruptcy = 'echo "sudo halt" | at now + 15 minutes'

move_frome_s3 = 'aws s3 cp s3://my-bucket/media/{0}/Sessions/{1}/Uploads/{2} ./python-scripts/data/'.format(uName,theDate, file)

move_about_file = 'aws s3 cp s3://my-bucket/media/{}/about.txt ./python-scripts/data/results/result-dicts/'.format(uName)
    
move_assessment_file = 'aws s3 cp s3://my-bucket/media/{}/ranges.txt ./python-scripts/data/results/result-dicts/'.format(uName)
    
convert_file= 'cd python-scripts && python3 convert_codec.py --username {0} --date {1}'.format(uName, theDate)

key_location = "/my/key/folder/MyKey.pem"

k = paramiko.RSAKey.from_private_key_file(key_location)
c = paramiko.SSHClient()
c.set_missing_host_key_policy(paramiko.AutoAddPolicy())


while True:
    try:
        c.connect( hostname = ip_add, username = "ubuntu", pkey = k, banner_timeout=60)
        break
    except:
        sleep(1.5)


commands = [prevent_bankruptcy, make_dir, move_frome_s3, move_about_file, convert_file, move_assessment_file, create_folder]


for command in commands:
    print ("Executing {}".format( command ))
    stdin , stdout, stderr = c.exec_command(command)
    errList.append(stderr.read())
    print (stdout.read())
    print( "Errors")
    print ("***",stderr.read())
c.close()

try:
    ec2.stop_instances(InstanceIds=[instance_id], DryRun=False)

except ClientError as e:
    if 'DryRunOperation' not in str(e):
        raise

try:
    ec2.stop_instances(InstanceIds=[instance_id], DryRun=False)

except 'ClientError' as e:
    print(e)

If I edit commands to only run prevent_bankruptcy which calls 'sudo echo halt' and let the server sit idle for 15 minutes it will automatically shut down.如果我编辑命令只运行调用“sudo echo halt”的 prevent_bankruptcy 并让服务器空闲 15 分钟,它将自动关闭。 However if something goes wrong with convert_file then it will continue to run indefinitely which can lead to a surprise come billing time.但是,如果 convert_file 出现问题,那么它将继续无限期地运行,这可能会导致计费时间出现意外。

You can use a timeout function in python using signals and terminate the instance from the external script you wrote above.您可以使用信号在 python 中使用超时 function 并从您上面编写的外部脚本终止实例。

import signal

def handler(signum, frame):
    raise TimeoutError('Timeout')

def loop():
    for command in commands:
        c.exec_command(command)

signal.signal(signal.SIGALRM, handler)
signal.alarm(60)

try:
    loop()
except TimeoutError:
    ec2.stop_instances()

signal.alarm(0)

The timeout code is taken from the the top answer on Timeout on a function call超时代码取自function 调用超时的最佳答案

Also a warning from comments on the same answer.也是来自对同一答案的评论的警告。 It works only on the main python thread.它仅适用于主 python 线程。 You also have to set it off with signal.alarm(0) which I included and it does not work with C extensions.您还必须使用我包含的signal.alarm(0)将其关闭,并且它不适用于 C 扩展。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM