简体   繁体   中英

java heap space and message loss

I'm running a java program on many computers that interact between them. After several hours (2-5 hours) computers start failing (threads start getting into deadlocks, messages start getting lost - peculiar stuff if you take into account that in the first hour or so things were running great).

I have a suspicion that it's because I'm using too much memory. I'm running on linux so and this is the relevant output of top :

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
30376 username  18   0  976m 132m 6804 S    0  4.0   0:05.60 java
  1. does this seem high?
  2. other ideas as should why these bugs would happen would be welcomed..

Another thing that may be happening is that you're running out of connections. This happened to a colleague of mine just yesterday.

ulimit -n will tell you how many file handles you may open; netstat -at will tell you how many sockets are open. When the second number approaches the first, attempts to open connections will start to fail.

In this particular case, when connections were still in OPEN_WAIT after having been used, a forced garbage collection ( Runtime.gc() ) helped.

You can get insight in the trend of memory usage by looking at the state of the JVM heapsize and logging it regularly. From these logs you can plot a graph and see if there are anomalies. (Btw, a jigsaw pattern is normal garbage collect behaviour.)

    // Memory status
    Runtime     runtime =   Runtime.getRuntime();
    final long  totalMem =  runtime.totalMemory();
    final long  freeMem =   runtime.freeMemory();
    if (log.isDebugEnabled()) {
        log.debug("Memory free=" + freeMem + 
                " used=" + (totalMem - freeMem) + 
                " total=" + totalMem);
    }

Possible issues:

  1. Resources(sockets, database, etc.) not being properly closed
  2. Memory leaks (references being held in a Collection, non closed Resources)
  3. Subtle concurrency bugs that show up very rarely (which would show up after hours)
  4. Message loss from the buffer on the socket being overwritten before you have a chance to read it or getting a message that is bigger than the buffer, usually corrected by having a thread read socket as soon as data goes on it and put it on a work queue the main processing thread can handle

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM