繁体   English   中英

使用Python自动进行流程监控/管理

[英]Automatic process monitoring/management with Python

是的,所以我有一个不断运行的python进程,甚至在Supervisor上也可以。 实现以下监视的最佳方法是什么?

  • 发送警报,如果进程崩溃,则重新启动。 我希望每次进程崩溃时自动接收信号,然后自动重新启动它。
  • 发送警报,如果流程过时,即1分钟未处理任何内容,则重新启动。
  • 按需重启

我想通过Python实现以上所有功能。 我知道Supervisord将完成大部分操作,但是我想看看它是否可以通过Python本身完成。

我认为您正在寻找的是Supervisor Events。 http://supervisord.org/events.html

还要看一下Superlance,它是一套插件实用程序,用于监视和控制在主管下运行的进程。 [ https://superlance.readthedocs.org/en/latest/]

您可以配置崩溃电子邮件,崩溃短信,内存消耗警报,HTTP挂钩等内容。

好吧,如果您想要一个本地解决方案,这就是我能想到的。

保持进程状态在Redis中处于实际状态和预期状态。 您可以通过使Web界面检查实际状态并更改预期状态来以所需方式对其进行监视。

在crontab中运行python脚本以检查状态,并在需要时采取适当的措施。 在这里,我每3秒钟检查一次,并使用SES通过电子邮件提醒管理员。

免责声明:该代码尚未运行或测试。 我现在才写,容易出错。

打开crontab文件:

$crontab -e

在其末尾添加此行,以使run_process.sh每分钟运行一次。

#Runs this process every 1 minute.
*/1 * * * * bash ~/path/to/run_monitor.sh

run_moniter.sh运行python脚本。 它每3秒在for循环中运行一次。

这样做是因为crontab给出了1分钟的最小时间间隔。 我们想每3秒检查一次该过程20次(3秒* 20 = 1分钟)。 因此它将运行一分钟,然后crontab再次运行它。

run_monitor.sh

for count in {0..20}
do
    cd '/path/to/check_status'
    /usr/local/bin/python check_status.py "myprocessname" "python startcommand.py"
    sleep 3 #check every 3 seconds.
done

我在这里假设:

*状态0 =停止或停止(预期与实际)

*状态-1 =重新启动

*状态1 =运行或正在运行

您可以根据需要添加更多状态,过时的过程也可以是状态。

我使用过进程名来杀死或启动或检查进程,您可以轻松地对其进行修改以读取特定的PID文件。

check_status.py

import sys
import redis
import subprocess

import sys
import boto.ses


def send_mail(recipients, message_subject, message_body):
    """
    uses AWS SES to send mail.
    """
    SENDER_MAIL = 'xxx@yyy.com'
    AWS_KEY = 'xxxxxxxxxxxxxxxxxxx'
    AWS_SECRET = 'xxxxxxxxxxxxxxxxxxx'
    AWS_REGION = 'xx-xxxx-x'

    mail_conn = boto.ses.connect_to_region(AWS_REGION, 
                                           aws_access_key_id=AWS_KEY, 
                                           aws_secret_access_key=AWS_SECRET
                                           )

    mail_conn.send_email(SENDER_MAIL, message_subject, message_body, recipient, format='html')
    return True

class Shell(object):
    '''
    Convinient Wrapper over Subprocess.
    '''
    def __init__(self, command, raise_on_error=True):
        self.command = command
        self.output = None
        self.error = None
        self.return_code

    def run(self):
        try:
            process = subprocess.Popen(self.command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
            self.return_code = process.wait()
            self.output, self.error = process.communicate()
            if self.return_code and self.raise_on_error:
                print self.error
                raise Exception("Error while executing %s::%s"%(self.command, self.error))    
        except subprocess.CalledProcessError:
            print self.error
            raise Exception("Error while executing %s::%s"%(self.command, self.error))


redis_client = redis.Redis('xxxredis_hostxxx')

def get_state(process_name, state_type): #state_type will be expected or actual.
    state = redis.get('{process_name}_{state_type}_state'.format(process_name=process_name, state_type=state_type)) #value could be 0 or 1
    return state

def set_state(process_name, state_type, state): #state_type will be expected or actual.
    state = redis.set('{process_name}_{state_type}_state'.format(process_name=process_name, state_type=state_type), state)
    return state

def get_stale_state(process_name):
    state = redis.get('{process_name}_stale_state'.format(process_name=process_name)) #value could be 0 or 1
    return state

def check_running_status(process_name):
    command = "ps -ef|grep {process_name}|wc -l".format(process_name=process_name)
    shell = Shell(command = command)
    shell.run()
    if shell.output=='0':
        return False
    return True

def start_process(start_command): #pass start_command with a '&' so the process starts in the background.
    shell = Shell(command = command)
    shell.run()

def stop_process(process_name):
    command = "ps -ef| grep {process_name}| awk '{print $2}'".format(process_name=process_name)
    shell = Shell(command = command, raise_on_error=False)
    shell.run()
    if not shell.output:
        return
    process_ids = shell.output.strip().split()
    for process_id in process_ids:
        command = 'kill {process_id}'.format(process_id=process_id)
        shell = Shell(command=command, raise_on_error=False)
        shel.run()


def check_process(process_name, start_command):
    expected_state = get_state(process_name, 'expected')
    if expected_state == 0: #stop
        stop_process(process_name)
        set_state(process_name, 'actual', 0)

    else if expected_state == -1: #restart
        stop_process(process_name)
        set_state(process_name, 'actual', 0)
        start_process(start_command)
        set_state(process_name, 'actual', 1)
        set_state(process_name, 'expected', 1) #set expected back to 1 so we dont keep on restarting.

    elif expected_state == 1:
        running = check_running_status(process_name)
        if not running:
            set_state(process_name, 'actual', 0)
            send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is Down. Trying to restart")
            start_process(start_command)
            running = check_running_status(process_name)
            if running:
                send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is was restarted.")
                set_state(process_name, 'actual', 1)
            else:
                send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is could not be restarted.")


if __name__ == '__main__':
    args = sys.argv[1:]
    process_name = args[0]
    start_command = args[1]
    check_process(process_name, start_command)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM