简体   繁体   English

在“重载一天”之后,Linux 上的 Rebol 2 进程在 SIGTERM 上停止

[英]Rebol 2 process on Linux halts on SIGTERM after a "day of heavy load"

I am using Cheyenne for a relatively high-load web application.我将 Cheyenne 用于相对高负载的 Web 应用程序。 It works great and fast.它工作得很好而且很快。 But I have a problem that started appearing after upgrade to Ubuntu 14.04, or I started noticing it then because the load increased.但是我在升级到 Ubuntu 14.04 后开始出现问题,或者我开始注意到它,因为负载增加了。

After few days of working, when a Rebol worker process should exit, the process starts to consume 100% CPU and "does nothing".经过几天的工作,当 Rebol 工作进程应该退出时,该进程开始消耗 100% 的 CPU 并且“什么都不做”。 I looked at process with strace and when it's in 100 CPU it doesn't call the OS in any way.我用strace查看了进程,当它在 100 个 CPU 中时,它不会以任何方式调用操作系统。 I looked at the Cheyenne worker code (if there is any fault there) and the code executes OK to the Rebol command exit .我查看了 Cheyenne 工人代码(如果那里有任何错误)并且代码对 Rebol 命令exit执行 OK 。 This command makes it loop forever.这个命令使它永远循环。 It's the same if I try to kill the process with sigterm .如果我尝试使用sigterm 终止进程也是如此

I can then kill it with sigkill .然后我可以用sigkill杀死它。 The process only gets into this state after few days of heavy load, and I haven't been able to replicate it in non-production environment or on local computer.该过程仅在几天的重负载后才进入此状态,并且我无法在非生产环境或本地计算机上复制它。

My naive thinking is that it loops forever while trying to clear it's memory before exiting, or maybe the open files / sockets.我天真的想法是它会在退出之前尝试清除内存或打开的文件/套接字时永远循环。 I looked the processes before/after with lsof (and similar), but since the event isn't easily reproducible haven't figured anything out yes.我用lsof (和类似的)查看了之前/之后的过程,但是由于该事件不容易重现,所以我没有想出是的。

My question is: has anyone seen Rebol2 go into eternal 100% loop on exit and under which circumstances?我的问题是:有没有人看到 Rebol2 在退出时进入永恒的 100% 循环,在什么情况下? Does anyone have any idea about solving this?有没有人对解决这个问题有任何想法?

I've seen this on our production cheyenne servers, with 100% cpu not responding, probably after serving a very long file (lot of datas in the response)... Never managed to find time to diagnostic more this issue, ending writing a monitor in go that kills 100% cpu process for a too long time.我在我们的生产 cheyenne 服务器上看到过这个,100% 的 cpu 没有响应,可能是在提供了一个很长的文件(响应中的大量数据)之后......从来没有找到时间来诊断更多这个问题,结束写一个监控在很长一段时间内杀死 100% cpu 进程的 go。

https://github.com/Softinnov/bearded-monitor https://github.com/Softinnov/bearded-monitor

You can use it in a docker container您可以在 docker 容器中使用它

https://hub.docker.com/r/softinnov/bearded-monitor/ https://hub.docker.com/r/softinnov/bearded-monitor/

Hope it helps.希望能帮助到你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM