简体繁体 English

在“重载一天”之后，Linux 上的 Rebol 2 进程在 SIGTERM 上停止

[英]Rebol 2 process on Linux halts on SIGTERM after a "day of heavy load"

原文 2017-01-26 19:16:10 3 1 linux/ rebol/ rebol2/ cheyenne

I am using Cheyenne for a relatively high-load web application.我将 Cheyenne 用于相对高负载的 Web 应用程序。 It works great and fast.它工作得很好而且很快。 But I have a problem that started appearing after upgrade to Ubuntu 14.04, or I started noticing it then because the load increased.但是我在升级到 Ubuntu 14.04 后开始出现问题，或者我开始注意到它，因为负载增加了。

After few days of working, when a Rebol worker process should exit, the process starts to consume 100% CPU and "does nothing".经过几天的工作，当 Rebol 工作进程应该退出时，该进程开始消耗 100% 的 CPU 并且“什么都不做”。 I looked at process with strace and when it's in 100 CPU it doesn't call the OS in any way.我用strace查看了进程，当它在 100 个 CPU 中时，它不会以任何方式调用操作系统。 I looked at the Cheyenne worker code (if there is any fault there) and the code executes OK to the Rebol command exit .我查看了 Cheyenne 工人代码（如果那里有任何错误）并且代码对 Rebol 命令exit执行 OK 。 This command makes it loop forever.这个命令使它永远循环。 It's the same if I try to kill the process with sigterm .如果我尝试使用sigterm 终止进程也是如此。

I can then kill it with sigkill .然后我可以用sigkill杀死它。 The process only gets into this state after few days of heavy load, and I haven't been able to replicate it in non-production environment or on local computer.该过程仅在几天的重负载后才进入此状态，并且我无法在非生产环境或本地计算机上复制它。

My naive thinking is that it loops forever while trying to clear it's memory before exiting, or maybe the open files / sockets.我天真的想法是它会在退出之前尝试清除内存或打开的文件/套接字时永远循环。 I looked the processes before/after with lsof (and similar), but since the event isn't easily reproducible haven't figured anything out yes.我用lsof （和类似的）查看了之前/之后的过程，但是由于该事件不容易重现，所以我没有想出是的。

My question is: has anyone seen Rebol2 go into eternal 100% loop on exit and under which circumstances?我的问题是：有没有人看到 Rebol2 在退出时进入永恒的 100% 循环，在什么情况下？ Does anyone have any idea about solving this?有没有人对解决这个问题有任何想法？

1 个解决方案

I've seen this on our production cheyenne servers, with 100% cpu not responding, probably after serving a very long file (lot of datas in the response)... Never managed to find time to diagnostic more this issue, ending writing a monitor in go that kills 100% cpu process for a too long time.我在我们的生产 cheyenne 服务器上看到过这个，100% 的 cpu 没有响应，可能是在提供了一个很长的文件（响应中的大量数据）之后......从来没有找到时间来诊断更多这个问题，结束写一个监控在很长一段时间内杀死 100% cpu 进程的 go。

https://github.com/Softinnov/bearded-monitor https://github.com/Softinnov/bearded-monitor

You can use it in a docker container您可以在 docker 容器中使用它

https://hub.docker.com/r/softinnov/bearded-monitor/ https://hub.docker.com/r/softinnov/bearded-monitor/

Hope it helps.希望能帮助到你。