[英]java.io.IOException: error=11
I am experiencing a weird problem with the Java ProcessBuilder
. 我遇到了Java ProcessBuilder
一个奇怪的问题。 The code is shown below (in a slightly simplified form) 代码如下所示(略微简化)
public class Whatever implements Runnable
{
public void run(){
//someIdentifier is a randomly generated string
String in = someIdentifier + "input.txt";
String out = someIdentifier + "output.txt";
ProcessBuilder builder = new ProcessBuilder("./whateveer.sh", in, out);
try {
Process process = builder.start();
process.waitFor();
} catch (IOException e) {
log.error("Could not launch process. Command: " + builder.command(), e);
} catch (InterruptedException ex) {
log.error(ex);
}
}
}
whatever.sh reads: whatever.sh读取:
R --slave --args $1 $2 <whatever1.R >> r.log
Loads of instances of Whatever
are submitted to an ExecutorService
of fixed size (35). 的实例的负载Whatever
被提交到一个ExecutorService
固定大小(35)的。 The rest of the application waits for all of them to finish- implemented with a CountdownLatch
. 应用程序的其余部分等待所有这些程序完成 - 使用CountdownLatch
实现。 Everything runs fine for several hours (Scientific Linux 5.0, java version "1.6.0_24") before throwing the following exception: 在抛出以下异常之前,一切运行良好几个小时(Scientific Linux 5.0,java版本“1.6.0_24”):
java.io.IOException: Cannot run program "./whatever.sh": java.io.IOException: error=11, Resource temporarily unavailable
at java.lang.ProcessBuilder.start(Unknown Source)
... rest of stack trace omitted...
Does anyone have an idea what this means? 有谁知道这意味着什么? Based on the google/bing search results for java.io.IOException: error=11
, it is not the most common of exceptions and I am completely baffled. 基于java.io.IOException: error=11
的google / bing搜索结果java.io.IOException: error=11
,它不是最常见的异常,我完全感到困惑。
My wild and not so educated guess is that I have too many threads trying to launch the same file at the same time. 我疯狂且没有那么受过教育的猜测是我有太多的线程试图同时启动同一个文件。 However, it takes hours of CPU time to reproduce the problem, so I have not tried with a smaller number. 但是,重现问题需要数小时的CPU时间,因此我没有尝试使用较小的数字。
Any suggestions are greatly appreciated. 任何建议都非常感谢。
The error=11
is almost certainly the EAGAIN
error code: error=11
几乎肯定是EAGAIN
错误代码:
$ grep EAGAIN asm-generic/errno-base.h
#define EAGAIN 11 /* Try again */
The clone(2)
system call documents an EAGAIN
error return: clone(2)
系统调用记录了EAGAIN
错误返回:
EAGAIN Too many processes are already running.
The fork(2)
system call documents two EAGAIN
error returns: fork(2)
系统调用文档两个EAGAIN
错误返回:
EAGAIN fork() cannot allocate sufficient memory to copy the
parent's page tables and allocate a task structure for
the child.
EAGAIN It was not possible to create a new process because
the caller's RLIMIT_NPROC resource limit was
encountered. To exceed this limit, the process must
have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE
capability.
If you were really that low on memory, it would almost certainly show in the system logs. 如果你的内存真的那么低,几乎肯定会在系统日志中显示出来。 Check dmesg(1)
output or /var/log/syslog
for any potential messages about low system memory. 检查dmesg(1)
输出或/var/log/syslog
以查找有关低系统内存的任何潜在消息。 (Other things would break. This doesn't seem too plausible.) (其他事情会破裂。这似乎不太合理。)
Much more likely is running into either the per-user limit on processes or system-wide maximum number of processes. 更有可能的是,每个用户对进程的限制或系统范围的最大进程数都会遇到这种情况。 Perhaps one of your processes isn't properly reapting zombies? 也许你的一个过程没有正确收割僵尸? This would be very easy to spot by checking ps(1)
output over time: 通过随时间检查ps(1)
输出,可以很容易地发现这一点:
while true ; do ps auxw >> ~/processes ; sleep 10 ; done
(Maybe check every minute or ten minutes if it really does take hours before you're in trouble.) (如果在遇到麻烦之前确实需要几个小时,也许每分钟或十分钟检查一次。)
If you're not reaping zombies, then read up on whatever you must do to ProcessBuilder to use waitpid(2)
to reap your dead children. 如果你没有收获僵尸,那么请阅读你必须对ProcessBuilder做的任何事情来使用waitpid(2)
来收获死去的孩子。
If you're legitimately running more processes than your rlimits allow, you'll need to use ulimit
in your bash(1)
scripts (if running as root
) or set higher limits in /etc/security/limits.conf
for the nproc
property. 如果合法运行的进程多于rlimits允许的进程,则需要在bash(1)
脚本中使用ulimit
(如果以root
身份运行)或在/etc/security/limits.conf
为nproc
属性设置更高的限制。
If you are instead running into the system-wide process limits, you might need to write a larger value into /proc/sys/kernel/pid_max
. 如果您遇到系统范围的进程限制,则可能需要在/proc/sys/kernel/pid_max
写入更大的值。 See proc(5)
for some (short) details. 有关(短)详细信息,请参阅proc(5)
。
errno 11 means "Resource temporarily unavailable" This is usually a memory problem and can prevent a thread or socket being created. errno 11表示“资源暂时不可用”这通常是内存问题,可以防止创建一个线程或套接字。
errno 12 means "Can't allocate memory". errno 12表示“无法分配内存”。 This is a failure to obtain memory is a direct call for memory (rather than a resource which in turn needs memory) 这是一个获取内存失败的直接调用内存(而不是一个需要内存的资源)
I would try increasing the swap space of your system which should avoid this issue. 我会尝试增加系统的交换空间,这应该避免这个问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.