简体   繁体   English

Java进程在Linux上被冻结(?)

[英]java process is frozen(?) on linux

This is my first question on SO 这是我的第一个问题
I have a very odd problem. 我有一个很奇怪的问题。
Below is my problem... 以下是我的问题...

I write very simple method that write some text to a file. 我写了一个非常简单的方法,将一些文本写入文件。
Of course it works well my machine(XP, 4CPU , jdk1.5.0_17[SUN]) 当然可以在我的机器上正常运行(XP, 4CPU ,jdk1.5.0_17 [SUN])
But it somtimes freezes on operating server 但是它有时会在运行的服务器上冻结
(Linux Accounting240 2.4.20-8smp, 4CPU , jdk1.5.0_22[SUN]). (Linux Accounting240 2.4.20-8smp, 4CPU ,jdk1.5.0_22 [SUN])。

kill -3 doesn't work. 杀死-3不起作用。
ctrl + \\ doesn't work. Ctrl + \\不起作用。

So, I can't show you the thread dump. 因此,我无法向您展示线程转储。

It freezes well.. When I just write some Thread.sleep(XX) at this method, the problem is gone well(?)... 它冻结得很好..当我只用这种方法写一些Thread.sleep(XX)时,问题就解决了(?)...
sleep(XX) break... it happened again today with Thread.sleep(XX)... sleep(XX)break ...今天又发生了Thread.sleep(XX)...

Do you know this problem? 你知道这个问题吗? Do you have the some solution about that? 您对此有一些解决方案吗? Thanks. 谢谢。 :-) :-)

PS PS
linux distribution: Red Hat Linux 3.2.2-5 linux发行版:Red Hat Linux 3.2.2-5
command: java -cp . 命令:java -cp。 T Ť

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.text.SimpleDateFormat;
import java.util.Date;

public class T {
private BufferedWriter writer = null;

private void log(String log) {
    try {
        if (writer == null) {
            File logFile = new File("test.log");
            writer = new BufferedWriter(new OutputStreamWriter(
                    new FileOutputStream(logFile, true)));
        }
        writer.write(new SimpleDateFormat("[yyyy-MM-dd HH:mm:ss] ")
                .format(new Date()));
        writer.write("[" + log + "]" + "\n");
        writer.flush();

         /*
                         *  this is ad hoc solution ???
                         */
        //Thread.sleep(10);
    } catch (Exception e) {
        e.printStackTrace();
    } finally {         
    }

}

public void test() {
    long startTime = System.currentTimeMillis();

    while (true) {
        log(String.valueOf(System.currentTimeMillis()));
        System.out.println(System.currentTimeMillis());
        try {
            //Thread.sleep((int) (Math.random() * 100));
        } catch (Exception e) {
            break;
        }

        if (System.currentTimeMillis() - startTime > 1000 * 5) {
            break;
        }
    }

    if (writer != null) {
        try {
            writer.close();
        } catch (Exception e) {
        }
    }
    System.out.println("OK");
}

public static void main(String[] args) {
    new T().test();
}
}

If the JVM does not respond to kill -3 then it is not your program but the JVM that is failing which is bad and would require a bug report to Sun. 如果JVM不响应kill -3,则不是您的程序而是失败的JVM,这很糟糕,需要向Sun报告错误。

I noticed you are running a 2.4.20-8smp kernel. 我注意到您正在运行2.4.20-8smp内核。 This is not a typical kernel for a current open source Linux distribution, so I would suggest you have a look at http://java.sun.com/j2se/1.5.0/system-configurations.html to see if you are deploying to a supported configuration. 对于当前的开放源代码Linux发行版,这不是典型的内核,因此建议您访问http://java.sun.com/j2se/1.5.0/system-configurations.html来查看是否正在部署到受支持的配置。 If not, you should let the responsible people know this! 如果没有,您应该让负责人员知道这一点!

The first step is to get a thread dump of where the program is when it "freezes". 第一步是获取程序“冻结”时所在位置的线程转储。 If this were on Java 6, you could connect JVisualVM or JConsole to it by default, and get the stacktraces of all the threads from there. 如果是在Java 6上,则可以默认将JVisualVM或JConsole连接到它,并从那里获取所有线程的堆栈跟踪。 Since it's Java 5, you should be able to use the jstack command to get a thread dump (or you could enable JMX with a command-line option to attach the aforementioned tools, but I don't think it's worth it in this case). 由于它是Java 5,因此您应该能够使用jstack命令获取线程转储(或者您可以通过命令行选项启用JMX来附加上述工具,但在这种情况下我认为这样做不值得) 。 In all cases, pressing Ctrl-Break from the console that launched the application may also produce a thread dump, depending on the environment. 在所有情况下,根据环境,从启动应用程序的控制台中按Ctrl-Break也会产生线程转储。

Do this several times a few seconds apart and then compare the thread dumps. 间隔几秒钟执行几次,然后比较线程转储。 If they're always identical, then it looks like your application is deadlocked; 如果它们始终相同,那么您的应用程序似乎已死锁; and the top line of the dump will show exactly where the threads are blocking (which will give a very good clue, when you look at that line of the code, which resources they're blocked on). 转储的第一行将准确显示线程被阻塞的位置(当您查看代码的那一行时,将提供一个很好的线索,它们将被阻塞在哪些资源上)。

On the other hand if the thread dumps change from time to time, the program is not strictly deadlocked but looks like it's running in an infinite loop - perhaps one of your loop conditions is not declared properly so the threads never exit or something of that sort. 另一方面,如果线程转储不时发生变化,则该程序并不是严格死锁的,而是看起来像是在无限循环中运行-也许您的一个循环条件未正确声明,所以线程永远不会退出或发生类似情况。 Again, look at the set of thread dumps to see what area of code each thread is looping around in, which will give you an idea of the loop condition that is never evaluating to an exit condition. 再次,查看线程转储集,以查看每个线程在哪个代码区中循环,这将使您对循环条件一无所知,而该条件永远不会评估为退出条件。

If the issue isn't obvious from this analysis, post back the dumps as it will help people debug your above code. 如果从此分析中发现问题并不明显,请回发转储,因为它将帮助人们调试您的上述代码。

I think this is a race condition. 我认为这是比赛条件。 The while(true) will force the VM on linux to write and flush continuously, and the linux kernel VM will try to intercept those calls and buffer the writing. while(true)将强制Linux上的VM连续写入和刷新,而Linux内核VM将尝试拦截这些调用并缓冲写入。 This will make the process spinloop while waiting for the syscall to be completed; 这将在等待系统调用完成时使进程旋转循环。 at the same time, it will be picked up by the scheduler and assigned to another CPU (I might be wrong here, tho). 同时,它会被调度程序拾取并分配给另一个CPU(在这里我可能错了)。 The new CPU will try to acquire a lock on the resource, and everything will result in a deadlock. 新的CPU将尝试获取对资源的锁定,并且所有操作都会导致死锁。

This might be a sign of other issues to come. 这可能预示着其他问题的来临。 I suggest: 我建议:

  • first of all, for clarity's sake: move the file creation outside of the log() method. 首先,为了清楚起见:将文件创建移到log()方法之外。 That's what constructors are for. 这就是构造函数的用途。

  • secondly, why are you trying to write to a file like that? 其次,为什么要尝试写入这样的文件? Are you sure your program logic makes sense in the first place? 您确定程序逻辑首先有意义吗? Would you not rather write your log messages to a container (say, an ArrayList) and every XX seconds dump that to disk in a separate thread? 您是否愿意将日志消息写入容器(例如ArrayList),并每隔XX秒将其转储到单独线程中的磁盘上? Right now you're limiting your logging ability to your disk speed: something you might want to avoid. 现在,您将日志记录能力限制为磁盘速度:您可能需要避免这种情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM