简体   繁体   中英

AWS EC2 Cloudwatch monitoring

Firstly, Appreciate your patience in reading and thinking through this problem I have mentioned here.

I had unique problem on one of my AWS EC2 instances(Ubuntu 14.04), where the instance just goes unreachable through either http or ping. It also locked me out of ssh access. I had to log in to aws console everytime, and reboot the instance manually. As a solution, I have configured cloudwatch monitoring to reboot the instance automatically and send a notification email to me, on any occasion where the system check has failed.

So far, so good.

Now, what I really want is the root cause / reason for instance going unreachable. I assuming that to be a memory issue. I have gone through the get-system-logs, which helped a bit. But, is there anyway, I can configure cloudwatch to send me the fail logs or something similar when it sends me the alert email. Or is there any way, I can alert myself with sufficient log info like - example : memory usage being 80%, network not responding etc- when I instance goes unreachable. I have heard of swap tool, but I am looking for something more generic, just not limited to memory monitoring.

Anything? Anyone has any idea?

I would go old skool and use a script on the server to log to a file

Presumably ( you don't mention this detail in the above ) there is a particular program running on the system that is giving you this problem

Usually system programs store their PID in a file. Let's assume the file is /var/run/nginx.pid. You can work this out for your particular system

Write a script to read the PID and record the memory use, for example add this file as "/usr/local/bin/mymemory"

PID=`cat /var/run/crond.pid`
# the 3 fields are %mem, VSZ and RSS
DATA=`ps uhp $PID| awk '{print $4, $5, $6}'`
NOW=`date --rfc-3339=sec`
echo "$NOW $DATA" >> /var/log/memory.log     

Add a line to crontab as root

* * * * * /usr/local/bin/mymemory.log

This will make an ever growing file for memory per minute. I suggest you login once a day and check it, download it if interesting and delete it. (In a real production context log rotation could be used)

Every time there is a crash the file should contain memory use data

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM