简体   繁体   中英

How to detect a long gc from within a JVM?

How can I detect a GC ( Edit or any stall) which exceeds some configured app timeout so that I can log a warning (or dynamically extend the timeout)?

Edit I am not asking for alternatives or workarounds like monitoring. I am writing a library and I cannot control the environment or settings. Whilst I will clearly document that users of the library must set an appropriate timeout I still expect people to overlook that else change jvm heap settings years later and forget to increase the timeouts. Support will be simpler if I can warn in the library logging of a possible pause greater than the configured timeouts. It doesn't have to be perfect detection "good enough" would reduce time wasted on library users not setting a sensible timeout.

Edit and to be clear the library works fine even if there is a big GC yet there is a good reason to have well chosen timeout which is to detect a crash such that the library tries to connect to an alternate peer.

You can use management bean notifications and subscribe to GARBAGE_COLLECTION_NOTIFICATION events which in turn provide GcInfo objects with the stats you want.

The javax.management package javadocs have a high level overview how to use those services.

Based on pointers given by @the8472 above I crafted a bit more complete sample for logging GC from inside the JVM (and thus detecting it). I hope this will save somebody some time :)

package fi.pelam.gclogutil;
import java.lang.management.*;
import java.util.Map;
import javax.management.openmbean.CompositeData;
import javax.management.*;

import com.sun.management.GarbageCollectionNotificationInfo;
import com.sun.management.GcInfo;

public class GcLogUtil {
    static public void startLoggingGc() {
        // http://www.programcreek.com/java-api-examples/index.php?class=javax.management.MBeanServerConnection&method=addNotificationListener
        // https://docs.oracle.com/javase/8/docs/jre/api/management/extension/com/sun/management/GarbageCollectionNotificationInfo.html#GARBAGE_COLLECTION_NOTIFICATION
        for (GarbageCollectorMXBean gcMbean : ManagementFactory.getGarbageCollectorMXBeans()) {
            try {
                ManagementFactory.getPlatformMBeanServer().
                        addNotificationListener(gcMbean.getObjectName(), listener, null,null);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

    static private NotificationListener listener = new NotificationListener() {
        @Override
        public void handleNotification(Notification notification, Object handback) {
            if (notification.getType().equals(GarbageCollectionNotificationInfo.GARBAGE_COLLECTION_NOTIFICATION)) {
                // https://docs.oracle.com/javase/8/docs/jre/api/management/extension/com/sun/management/GarbageCollectionNotificationInfo.html
                CompositeData cd = (CompositeData) notification.getUserData();
                GarbageCollectionNotificationInfo gcNotificationInfo = GarbageCollectionNotificationInfo.from(cd);
                GcInfo gcInfo = gcNotificationInfo.getGcInfo();
                System.out.println("GarbageCollection: "+
                        gcNotificationInfo.getGcAction() + " " +
                        gcNotificationInfo.getGcName() +
                        " duration: " + gcInfo.getDuration() + "ms" +
                        " used: " + sumUsedMb(gcInfo.getMemoryUsageBeforeGc()) + "MB" +
                        " -> " + sumUsedMb(gcInfo.getMemoryUsageAfterGc()) + "MB");
            }
        }
    };

    static private long sumUsedMb(Map<String, MemoryUsage> memUsages) {
        long sum = 0;
        for (MemoryUsage memoryUsage : memUsages.values()) {
            sum += memoryUsage.getUsed();
        }
        return sum / (1024 * 1024);
    }
}

First of all, what I am about to say does not apply to real-time systems, so let's get this immediately out of the way: If you want to build a real-time system with stringent constraints, then Java might not be the way to go.

Now, if you are not building a real-time system, then I would advise against being overly concerned about the possibility that GC might slow down your program, delay your program, freeze your program, etc.

Garbage collection in modern garbage-collected languages like java is highly streamlined, it works on a separate thread, it does as much of its work as possible in as small chunks as possible, and the chances of you witnessing a freeze-up due to garbage-collection are very slim.

On the other hand, in any modern non-real-time system there are so many different things that can happen that may slow down or temporarily freeze your program, (most importantly, paging ) that the contribution of GC will be negligible and most likely lost in the noise.

Amendment

After your amendment to your question, it appears that your need is to detect whether your runtime environment is experiencing high irregularities in the allocation of computing resources (CPU.) This is a much more general problem than detecting delays due to GC. (GC is just one possible source of such delays, and not even among the first suspects.) So, to solve for the general purpose, consider the following approach:

Create a separate thread which does the following in a loop:

1. record the current time.
2. sleep for a specific number of milliseconds. (Say, 50.)
3. record the current time again.

In a smoothly running system, the difference between the first and the second time should be very close to the amount of sleep. If your system is experiencing irregularities, then this time will vary wildly. Such wild variations persisting over a considerable period of time would mean that you have a system which is not running smoothly.

If you are really hell-bent on catching the GC freezing your program, you can make sure to perform some memory allocation between steps 2 and 3 above. Presumably, if the GC has frozen your java VM, it will take some time before this memory allocation gets honored. Trust me, it won't happen, but if that will give you peace of mind, then go ahead and test for it.

You can also further elaborate on this technique by synchronizing it with the main logic of your program so as to make sure that the main logic is alive and running.

regarding handling any timeout you can run your task inside a future http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html you then create a different thread to monitor the Future run, which checks if it is done, if it is not done by the timeout you specified you issue a warning in log or something.

ExecutorService svc = Executors.newFixedThreadPool( 1 ) ;
        Future<?> submit = svc.submit(r); 

//sleep for timeout.

if(!submit.isDone())
{
  log.warn("action is not done");
}

you can return the answer to the task with submit.get with timeout or without.

I've seen a working trick which basically does Thread.sleep(1000) on a watcher thread and measures the actual time spent in sleep. If it exceeds a threshold lets say 500ms, it likely means a long stop-the-world GC pause.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM