简体   繁体   English

防止Linux内存不足(OOM)冻结的最佳方法是什么?

[英]What is the best way to prevent out of memory (OOM) freezes on Linux?

Is there a way to make the OOM killer work and prevent Linux from freezing? 有没有办法使OOM杀手work工作并防止Linux冻结? I've been running Java and C# applications, where any memory allocated is usually used, and (if I'm understanding them right) overcommits are causing the machine to freeze. 我一直在运行Java和C#应用程序,这些应用程序通常使用分配的任何内存,并且(如果我理解正确的话)过量使用会导致计算机死机。 Right now, as a temporary solution, I added, 现在,作为一种临时解决方案,我补充道,

vm.overcommit_memory = 2
vm.overcommit_ratio = 10

to /etc/sysctl.conf. 到/etc/sysctl.conf。

Kudos to anyone who can explain why the existing OOM killer can't function correctly in a guaranteed manner, killing processes whenever the kernel runs out of "real" memory. 谁能解释为什么现有的OOM杀手无法以保证的方式正常工作,当内核用完“真正的”内存时就杀死进程,对任何人都表示敬意。

EDIT -- many responses are along the lines of Michael's "if you are experiencing OOM killer related problems, then you probably need to fix whatever is causing you to run out of memory". 编辑 -许多回应都遵循迈克尔的观点:“如果您遇到OOM杀手级问题,那么您可能需要修复导致内存不足的任何问题。” I don't think this is the correct solution. 我认为这不是正确的解决方案。 There will always be apps with bugs, and I'd like to adjust the kernel so my entire system doesn't freeze. 总是会有带有错误的应用程序,我想调整内核,这样我的整个系统就不会死机。 Given my current technical understandings, this doesn't seem like it should be impossible. 根据我目前的技术理解,这似乎应该是不可能的。

Below is a really basic perl script I wrote. 下面是我编写的一个非常基本的perl脚本。 With a bit of tweaking it could be useful. 稍作调整可能会很有用。 You just need to change the paths I have to the paths of any processes that use Java or C#. 您只需要将我拥有的路径更改为使用Java或C#的任何进程的路径即可。 You could change the kill commands I've used to restart commands also. 您也可以更改我用来重启命令的kill命令。 Of course to avoid typing in perl memusage.pl manually, you could put it into your crontab file to run automatically. 当然,要避免手动输入perl memusage.pl,可以将其放入crontab文件中以自动运行。 You could also use perl memusage.pl > log.txt to save its output to a log file. 您也可以使用perl memusage.pl> log.txt将其输出保存到日志文件中。 Sorry if it doesn't really help, but I was bored while drinking a cup of coffee. 抱歉,如果真的没有帮助,但是我在喝一杯咖啡时很无聊。 :-D Cheers :-D干杯

#!/usr/bin/perl -w
# Checks available memory usage and calculates size in MB
# If free memory is below your minimum level specified, then
# the script will attempt to close the troublesome processes down
# that you specify. If it can't, it will issue a -9 KILL signal.
#
# Uses external commands (cat and pidof)
#
# Cheers, insertable

our $memmin = 50;
our @procs = qw(/usr/bin/firefox /usr/local/sbin/apache2);

sub killProcs
{
    use vars qw(@procs);
    my @pids = ();
    foreach $proc (@procs)
    {
        my $filename=substr($proc, rindex($proc,"/")+1,length($proc)-rindex($proc,"/")-1);
        my $pid = `pidof $filename`;
        chop($pid);
        my @pid = split(/ /,$pid);
        push @pids, $pid[0];
    }
    foreach $pid (@pids)
    {
        #try to kill process normall first
        system("kill -15 " . $pid); 
        print "Killing " . $pid . "\n";
        sleep 1;
        if (-e "/proc/$pid")
        {
            print $pid . " is still alive! Issuing a -9 KILL...\n";
            system("kill -9 " + $pid);
            print "Done.\n";
        } else {
            print "Looks like " . $pid . " is dead\n";
        }
    }
    print "Successfully finished destroying memory-hogging processes!\n";
    exit(0);
}

sub checkMem
{
    use vars qw($memmin);
    my ($free) = $_[0];
    if ($free > $memmin)
    {
        print "Memory usage is OK\n";
        exit(0);
    } else {
        killProcs();
    }
}

sub main
{
    my $meminfo = `cat /proc/meminfo`;
    chop($meminfo);
    my @meminfo = split(/\n/,$meminfo);
    foreach my $line (@meminfo)
    {
        if ($line =~ /^MemFree:\s+(.+)\skB$/)
        {
            my $free = ($1 / 1024);
            &checkMem($free);
        }
    }
}

main();

If your processes's oom_adj is set to -17 it won't be considered for killing altough I doubt it's the issue here. 如果您的进程的oom_adj设置为-17,则不会考虑完全杀死它,我怀疑这是问题所在。

cat /proc/<pid>/oom_adj

will tell you the value of your process(es)'s oom_adj. 会告诉您过程的oom_adj的价值。

I'd have to say the best way of preventing OOM freezes is to not run out of virtual memory. 我不得不说防止OOM冻结的最佳方法是不耗尽虚拟内存。 If you are regularly running out of virtual memory, or getting close, then you have bigger problems. 如果您经常用尽虚拟内存或即将用尽虚拟内存,那么您将遇到更大的问题。

Most tasks don't handle failed memory allocations very well so tend to crash or lose data. 大多数任务不能很好地处理失败的内存分配,因此容易崩溃或丢失数据。 Running out of virtual memory (with or without overcommit) will cause some allocations to fail. 虚拟内存用完(有或没有过量使用)将导致某些分配失败。 This is usually bad. 这通常是不好的。

Moreover, before your OS runs out of virtual memory, it will start doing bad things like discarding pages from commonly used shared libraries, which is likely to make performance suck as they have to be pulled back in often, which is very bad for throughput. 此外,在您的操作系统耗尽虚拟内存之前,它将开始做坏事情,例如从常用的共享库中丢弃页面,这很可能会导致性能下降,因为它们不得不经常被拉回,这对吞吐量非常不利。

My suggestions: 我的建议:

  • Get more ram 获得更多的RAM
  • Run fewer processes 运行更少的进程
  • Make the processes you do run use less memory (This may include fixing memory leaks in them) 使您运行的进程使用较少的内存(这可能包括修复其中的内存泄漏)

And possibly also 还有可能

  • Set up more swap space 设置更多的交换空间

If that is helpful in your use-case. 如果这对您的用例有用。

Most multi-process servers run a configurable (maximum) number of processes, so you can typically tune it downwards. 大多数多进程服务器运行可配置(最大)数量的进程,因此您通常可以对其进行向下调整。 Multithreaded servers typically allow you to configure how much memory to use for their buffers etc internally. 多线程服务器通常允许您在内部配置要用于其缓冲区等的内存量。

First off, how can you be sure the freezes are OOM killer related? 首先,您如何确定冻结与OOM杀手相关? I've got a network of systems in the field and I get not infrequent freezes, which don't seem to be OOM related (our app is pretty stable in memory usage). 我在现场拥有一个系统网络,而且我不会遇到很少的死机现象,这似乎与OOM无关(我们的应用程序在内存使用方面相当稳定)。 Could it be something else? 可能还有其他吗? Is there any interesting hardware involved? 是否有任何有趣的硬件? Any unstable drivers? 有不稳定的驱动程序吗? High performance video? 高性能视频?

Even if the OOM killer is involved, and worked, you'd still have problems, because stuff you thought was running is now dead, and who knows what sort of mess it's left behind. 即使OOM杀手参与并工作,您仍然会遇到问题,因为您认为正在运行的东西现在已经死了,谁知道它留下了什么样的混乱。

Really, if you are experiencing OOM killer related problems, then you probably need to fix whatever is causing you to run out of memory. 确实,如果遇到OOM杀手级相关问题,则可能需要修复导致内存不足的任何问题。

I've found that fixing stability issues mostly relies on accurately identifying the root cause. 我发现解决稳定性问题主要取决于准确地找出根本原因。 Unfortunately, this requires being able to see what's happening when the issue happens, which is a really bad time to be trying to start various monitoring programs. 不幸的是,这需要能够查看问题发生时发生的情况,这对于尝试启动各种监视程序来说是非常糟糕的时间。

One thing I sometimes found helpful was to start a little monitoring script at boot time which would log various interesting numbers and snapshot the running processes. 我有时发现有帮助的一件事是在引导时启动一个小的监视脚本,该脚本将记录各种有趣的数字并为正在运行的进程提供快照。 Then, in the event of a crash, I could look at the situation just before the crash. 然后,如果发生崩溃,我可以看看崩溃之前的情况。 I sometimes found that intuition was quite wrong about the root cause. 有时我发现直觉根本原因是错误的。 Unfortunately, that script is long out-of-date, or I'd give a link. 不幸的是,该脚本已过期很久,否则我将提供一个链接。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 linux OOM(内存不足)杀手级电子邮件通知? - linux OOM (out of memory) killer email notification? 在Java中处理内存不足情况的最佳方法是什么? - What is the best way to handle out of memory conditions in Java? Linux:Java Web应用程序内存不足,没有响应请求但没有抛出OOM吗? - Linux: Java web application is running out of memory and not responding to requests but not throwing OOM? 有没有办法查询 realloc 会做什么,或者阻止它复制 Windows 和 Linux 上的所有内存? - Is there a way to either query what would realloc do, or prevent it from copying all memory on Windows and Linux? Qt应用程序被杀死,因为内存不足(OOM) - Qt application killed because Out Of Memory (OOM) 在Linux中进行处理器间通信的最佳方法是什么? - What is the best way for interprocessor communication in Linux? 在Linux中测试程序性能的最佳方法是什么 - what is the best way to test the performance of a program in linux 在Linux Mint上安装LAMP的最佳方法是什么 - What is the best way to install LAMP on Linux Mint 在Linux / Mono上运行ServiceStack的最佳方法是什么? - What is the best way to run ServiceStack on Linux / Mono? 在Linux上交付Java软件的最佳方法是什么? - What is the best way to deliver java software on Linux?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM