简体   繁体   English

使用Python自动进行流程监控/管理

[英]Automatic process monitoring/management with Python

Right, so I have a python process which is running constantly, maybe even on Supervisor. 是的,所以我有一个不断运行的python进程,甚至在Supervisor上也可以。 What is the best way to achieve the following monitoring? 实现以下监视的最佳方法是什么?

  • Send an alert and restart if the process has crashed. 发送警报,如果进程崩溃,则重新启动。 I'd like to automatically receive a signal every time the process crashes and auto restart it. 我希望每次进程崩溃时自动接收信号,然后自动重新启动它。
  • Send an alert and restart if the process has gone stale, ie hasn't crunched anything for say 1 minute. 发送警报,如果流程过时,即1分钟未处理任何内容,则重新启动。
  • Restart on demand 按需重启

I'd like the achieve all of the above through Python. 我想通过Python实现以上所有功能。 I know Supervisord will do most of it, but I want to see if it can be done through Python itself. 我知道Supervisord将完成大部分操作,但是我想看看它是否可以通过Python本身完成。

I think what you are looking for is, Supervisor Events. 我认为您正在寻找的是Supervisor Events。 http://supervisord.org/events.html http://supervisord.org/events.html

Also look at Superlance, its a package of plugin utilities for monitoring and controlling processes that run under supervisor. 还要看一下Superlance,它是一套插件实用程序,用于监视和控制在主管下运行的进程。 [ https://superlance.readthedocs.org/en/latest/] [ https://superlance.readthedocs.org/en/latest/]

You can configure stuff like Crash emails, Crash SMS, Memory consumption alerts, HTTP hooks etc. 您可以配置崩溃电子邮件,崩溃短信,内存消耗警报,HTTP挂钩等内容。

Well, if you want a homegrown solution, this is what I could come up with. 好吧,如果您想要一个本地解决方案,这就是我能想到的。

Maintain the process state both actual and expected in redis. 保持进程状态在Redis中处于实际状态和预期状态。 You can monitor it the way you want by making a web interface to check the actual state and change the expected state. 您可以通过使Web界面检查实际状态并更改预期状态来以所需方式对其进行监视。

Run the python script in crontab to check for state and take appropriate action when required. 在crontab中运行python脚本以检查状态,并在需要时采取适当的措施。 Here I have checked for every 3 seconds and used SES to alert admins via email. 在这里,我每3秒钟检查一次,并使用SES通过电子邮件提醒管理员。

DISCLAIMER: The code has not been run or tested. 免责声明:该代码尚未运行或测试。 I just wrote it now, so prone to errors. 我现在才写,容易出错。

open crontab file: 打开crontab文件:

$crontab -e

Add this line at the end of it, to make the run_process.sh run every minute. 在其末尾添加此行,以使run_process.sh每分钟运行一次。

#Runs this process every 1 minute.
*/1 * * * * bash ~/path/to/run_monitor.sh

run_moniter.sh runs the python script. run_moniter.sh运行python脚本。 It runs in a for loop every 3 second. 它每3秒在for循环中运行一次。

This is done because crontab gives minimum time interval of 1 minute. 这样做是因为crontab给出了1分钟的最小时间间隔。 We want to check for the process every 3 second, 20 times (3sec * 20 = 1 minute). 我们想每3秒检查一次该过程20次(3秒* 20 = 1分钟)。 So it will run for a minute before crontab runs it again. 因此它将运行一分钟,然后crontab再次运行它。

run_monitor.sh run_monitor.sh

for count in {0..20}
do
    cd '/path/to/check_status'
    /usr/local/bin/python check_status.py "myprocessname" "python startcommand.py"
    sleep 3 #check every 3 seconds.
done

Here I have assumed: 我在这里假设:

*state 0 = stop or stopped (expected vs. actual) *状态0 =停止或停止(预期与实际)

*state -1 = restart *状态-1 =重新启动

*state 1 = run or running *状态1 =运行或正在运行

You can add more states as per your convinience, stale process can also be a state. 您可以根据需要添加更多状态,过时的过程也可以是状态。

I have used processname to kill or start or check processes, you can easily modify it to read specific PID files. 我使用过进程名来杀死或启动或检查进程,您可以轻松地对其进行修改以读取特定的PID文件。

check_status.py check_status.py

import sys
import redis
import subprocess

import sys
import boto.ses


def send_mail(recipients, message_subject, message_body):
    """
    uses AWS SES to send mail.
    """
    SENDER_MAIL = 'xxx@yyy.com'
    AWS_KEY = 'xxxxxxxxxxxxxxxxxxx'
    AWS_SECRET = 'xxxxxxxxxxxxxxxxxxx'
    AWS_REGION = 'xx-xxxx-x'

    mail_conn = boto.ses.connect_to_region(AWS_REGION, 
                                           aws_access_key_id=AWS_KEY, 
                                           aws_secret_access_key=AWS_SECRET
                                           )

    mail_conn.send_email(SENDER_MAIL, message_subject, message_body, recipient, format='html')
    return True

class Shell(object):
    '''
    Convinient Wrapper over Subprocess.
    '''
    def __init__(self, command, raise_on_error=True):
        self.command = command
        self.output = None
        self.error = None
        self.return_code

    def run(self):
        try:
            process = subprocess.Popen(self.command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
            self.return_code = process.wait()
            self.output, self.error = process.communicate()
            if self.return_code and self.raise_on_error:
                print self.error
                raise Exception("Error while executing %s::%s"%(self.command, self.error))    
        except subprocess.CalledProcessError:
            print self.error
            raise Exception("Error while executing %s::%s"%(self.command, self.error))


redis_client = redis.Redis('xxxredis_hostxxx')

def get_state(process_name, state_type): #state_type will be expected or actual.
    state = redis.get('{process_name}_{state_type}_state'.format(process_name=process_name, state_type=state_type)) #value could be 0 or 1
    return state

def set_state(process_name, state_type, state): #state_type will be expected or actual.
    state = redis.set('{process_name}_{state_type}_state'.format(process_name=process_name, state_type=state_type), state)
    return state

def get_stale_state(process_name):
    state = redis.get('{process_name}_stale_state'.format(process_name=process_name)) #value could be 0 or 1
    return state

def check_running_status(process_name):
    command = "ps -ef|grep {process_name}|wc -l".format(process_name=process_name)
    shell = Shell(command = command)
    shell.run()
    if shell.output=='0':
        return False
    return True

def start_process(start_command): #pass start_command with a '&' so the process starts in the background.
    shell = Shell(command = command)
    shell.run()

def stop_process(process_name):
    command = "ps -ef| grep {process_name}| awk '{print $2}'".format(process_name=process_name)
    shell = Shell(command = command, raise_on_error=False)
    shell.run()
    if not shell.output:
        return
    process_ids = shell.output.strip().split()
    for process_id in process_ids:
        command = 'kill {process_id}'.format(process_id=process_id)
        shell = Shell(command=command, raise_on_error=False)
        shel.run()


def check_process(process_name, start_command):
    expected_state = get_state(process_name, 'expected')
    if expected_state == 0: #stop
        stop_process(process_name)
        set_state(process_name, 'actual', 0)

    else if expected_state == -1: #restart
        stop_process(process_name)
        set_state(process_name, 'actual', 0)
        start_process(start_command)
        set_state(process_name, 'actual', 1)
        set_state(process_name, 'expected', 1) #set expected back to 1 so we dont keep on restarting.

    elif expected_state == 1:
        running = check_running_status(process_name)
        if not running:
            set_state(process_name, 'actual', 0)
            send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is Down. Trying to restart")
            start_process(start_command)
            running = check_running_status(process_name)
            if running:
                send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is was restarted.")
                set_state(process_name, 'actual', 1)
            else:
                send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is could not be restarted.")


if __name__ == '__main__':
    args = sys.argv[1:]
    process_name = args[0]
    start_command = args[1]
    check_process(process_name, start_command)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM