简体   繁体   English

java.io.IOException:error = 11

[英]java.io.IOException: error=11

I am experiencing a weird problem with the Java ProcessBuilder . 我遇到了Java ProcessBuilder一个奇怪的问题。 The code is shown below (in a slightly simplified form) 代码如下所示(略微简化)

public class Whatever implements Runnable
{

public void run(){
        //someIdentifier is a randomly generated string
        String in = someIdentifier + "input.txt";
        String out = someIdentifier + "output.txt";
        ProcessBuilder builder = new ProcessBuilder("./whateveer.sh", in, out);
        try {
            Process process = builder.start();
            process.waitFor();
        } catch (IOException e) {
            log.error("Could not launch process. Command: " + builder.command(), e);
        } catch (InterruptedException ex) {
            log.error(ex);
        }
}

}

whatever.sh reads: whatever.sh读取:

R --slave --args $1 $2 <whatever1.R >> r.log    

Loads of instances of Whatever are submitted to an ExecutorService of fixed size (35). 的实例的负载Whatever被提交到一个ExecutorService固定大小(35)的。 The rest of the application waits for all of them to finish- implemented with a CountdownLatch . 应用程序的其余部分等待所有这些程序完成 - 使用CountdownLatch实现。 Everything runs fine for several hours (Scientific Linux 5.0, java version "1.6.0_24") before throwing the following exception: 在抛出以下异常之前,一切运行良好几个小时(Scientific Linux 5.0,java版本“1.6.0_24”):

java.io.IOException: Cannot run program "./whatever.sh": java.io.IOException: error=11, Resource temporarily unavailable
    at java.lang.ProcessBuilder.start(Unknown Source)
... rest of stack trace omitted...

Does anyone have an idea what this means? 有谁知道这意味着什么? Based on the google/bing search results for java.io.IOException: error=11 , it is not the most common of exceptions and I am completely baffled. 基于java.io.IOException: error=11的google / bing搜索结果java.io.IOException: error=11 ,它不是最常见的异常,我完全感到困惑。

My wild and not so educated guess is that I have too many threads trying to launch the same file at the same time. 我疯狂且没有那么受过教育的猜测是我有太多的线程试图同时启动同一个文件。 However, it takes hours of CPU time to reproduce the problem, so I have not tried with a smaller number. 但是,重现问题需要数小时的CPU时间,因此我没有尝试使用较小的数字。

Any suggestions are greatly appreciated. 任何建议都非常感谢。

The error=11 is almost certainly the EAGAIN error code: error=11几乎肯定是EAGAIN错误代码:

$ grep EAGAIN asm-generic/errno-base.h 
#define EAGAIN      11  /* Try again */

The clone(2) system call documents an EAGAIN error return: clone(2)系统调用记录了EAGAIN错误返回:

   EAGAIN Too many processes are already running.

The fork(2) system call documents two EAGAIN error returns: fork(2)系统调用文档两个EAGAIN错误返回:

   EAGAIN fork() cannot allocate sufficient memory to copy the
          parent's page tables and allocate a task structure for
          the child.

   EAGAIN It was not possible to create a new process because
          the caller's RLIMIT_NPROC resource limit was
          encountered.  To exceed this limit, the process must
          have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE
          capability.

If you were really that low on memory, it would almost certainly show in the system logs. 如果你的内存真的那么低,几乎肯定会在系统日志中显示出来。 Check dmesg(1) output or /var/log/syslog for any potential messages about low system memory. 检查dmesg(1)输出或/var/log/syslog以查找有关低系统内存的任何潜在消息。 (Other things would break. This doesn't seem too plausible.) (其他事情会破裂。这似乎不太合理。)

Much more likely is running into either the per-user limit on processes or system-wide maximum number of processes. 更有可能的是,每个用户对进程的限制或系统范围的最大进程数都会遇到这种情况。 Perhaps one of your processes isn't properly reapting zombies? 也许你的一个过程没有正确收割僵尸? This would be very easy to spot by checking ps(1) output over time: 通过随时间检查ps(1)输出,可以很容易地发现这一点:

while true ; do ps auxw >> ~/processes ; sleep 10 ; done

(Maybe check every minute or ten minutes if it really does take hours before you're in trouble.) (如果在遇到麻烦之前确实需要几个小时,也许每分钟或十分钟检查一次。)

If you're not reaping zombies, then read up on whatever you must do to ProcessBuilder to use waitpid(2) to reap your dead children. 如果你没有收获僵尸,那么请阅读你必须对ProcessBuilder做的任何事情来使用waitpid(2)来收获死去的孩子。

If you're legitimately running more processes than your rlimits allow, you'll need to use ulimit in your bash(1) scripts (if running as root ) or set higher limits in /etc/security/limits.conf for the nproc property. 如果合法运行的进程多于rlimits允许的进程,则需要在bash(1)脚本中使用ulimit (如果以root身份运行)或在/etc/security/limits.confnproc属性设置更高的限制。

If you are instead running into the system-wide process limits, you might need to write a larger value into /proc/sys/kernel/pid_max . 如果您遇到系统范围的进程限制,则可能需要在/proc/sys/kernel/pid_max写入更大的值。 See proc(5) for some (short) details. 有关(短)详细信息,请参阅proc(5)

errno 11 means "Resource temporarily unavailable" This is usually a memory problem and can prevent a thread or socket being created. errno 11表示“资源暂时不可用”这通常是内存问题,可以防止创建一个线程或套接字。

errno 12 means "Can't allocate memory". errno 12表示“无法分配内存”。 This is a failure to obtain memory is a direct call for memory (rather than a resource which in turn needs memory) 这是一个获取内存失败的直接调用内存(而不是一个需要内存的资源)

I would try increasing the swap space of your system which should avoid this issue. 我会尝试增加系统的交换空间,这应该避免这个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM