AWS EC2 Cloudwatch监控

Question

Firstly, Appreciate your patience in reading and thinking through this problem I have mentioned here. 首先，感谢您在阅读和思考此问题时的耐心。

I had unique problem on one of my AWS EC2 instances(Ubuntu 14.04), where the instance just goes unreachable through either http or ping. 我在一个AWS EC2实例（Ubuntu 14.04）上遇到了一个独特的问题，该实例只能通过http或ping无法访问。 It also locked me out of ssh access. 这也使我无法使用ssh访问。 I had to log in to aws console everytime, and reboot the instance manually. 我必须每次都登录到aws控制台，然后手动重新启动实例。 As a solution, I have configured cloudwatch monitoring to reboot the instance automatically and send a notification email to me, on any occasion where the system check has failed. 作为解决方案，我已将cloudwatch监控配置为在系统检查失败的任何情况下自动重新启动实例并向我发送通知电子邮件。

So far, so good. 到现在为止还挺好。

Now, what I really want is the root cause / reason for instance going unreachable. 现在，我真正想要的是实例无法到达的根本原因。 I assuming that to be a memory issue. 我认为这是一个内存问题。 I have gone through the get-system-logs, which helped a bit. 我浏览了get-system-logs，这有所帮助。 But, is there anyway, I can configure cloudwatch to send me the fail logs or something similar when it sends me the alert email. 但是，无论如何，我可以将cloudwatch配置为在向我发送警报电子邮件时向我发送失败日志或类似内容。 Or is there any way, I can alert myself with sufficient log info like - example : memory usage being 80%, network not responding etc- when I instance goes unreachable. 还是有什么办法，当实例无法访问时，我可以用足够的日志信息来提醒自己，例如-内存使用率为80％，网络没有响应等。 I have heard of swap tool, but I am looking for something more generic, just not limited to memory monitoring. 我听说过交换工具，但是我正在寻找更通用的工具，而不仅限于内存监视。

Anything? 有什么事吗 Anyone has any idea? 有人知道吗？

Answer 1

I would go old skool and use a script on the server to log to a file 我会很老套，并使用服务器上的脚本登录到文件

Presumably ( you don't mention this detail in the above ) there is a particular program running on the system that is giving you this problem 大概（您在上面没有提到此详细信息）系统上正在运行某个特定程序，该程序会给您带来此问题

Usually system programs store their PID in a file. 通常，系统程序将其PID存储在文件中。 Let's assume the file is /var/run/nginx.pid. 假设文件为/var/run/nginx.pid。 You can work this out for your particular system 您可以为您的特定系统解决这个问题

Write a script to read the PID and record the memory use, for example add this file as "/usr/local/bin/mymemory" 编写脚本以读取PID并记录内存使用情况，例如，将该文件添加为“ / usr / local / bin / mymemory”

PID=`cat /var/run/crond.pid`
# the 3 fields are %mem, VSZ and RSS
DATA=`ps uhp $PID| awk '{print $4, $5, $6}'`
NOW=`date --rfc-3339=sec`
echo "$NOW $DATA" >> /var/log/memory.log

Add a line to crontab as root 以root身份向crontab添加一行

* * * * * /usr/local/bin/mymemory.log

This will make an ever growing file for memory per minute. 这将使每分钟的内存文件不断增长。 I suggest you login once a day and check it, download it if interesting and delete it. 我建议您每天登录一次并检查一下，如果有兴趣请下载并删除。 (In a real production context log rotation could be used) （在实际生产环境中，可以使用日志轮换）

Every time there is a crash the file should contain memory use data 每次崩溃时，文件应包含内存使用数据

AWS EC2 Cloudwatch监控

问题描述

1 个解决方案

解决方案1
0 2017-11-14 14:05:26

AWS EC2 Cloudwatch监控

问题描述

1 个解决方案

解决方案1 0 2017-11-14 14:05:26

解决方案1
0 2017-11-14 14:05:26