简体   繁体   中英

Monitoring a heavily loaded Java application

I'd like to know what are the standard practices in obtaining performance metrics of an application preferably in Java. Currently we have a task scheduled periodically that collects system metrics. Often this task isn't scheduled on time causing metrics to not be available for that time causing the monitoring dashboards to be broken [in case of line graph there will be gaps].

Typically when an application is performing poorly, that's when we'd want all the metrics to be available. But we've observed that those are the time when we are unable to collect any metrics [because the application is very busy]

  1. You can use a tool called top-threads, found here: https://bitbucket.org/pjtr/topthreads

    What this does is gives you every usage detail you are requesting (RAM, CPU, etc..) about every class and thread loaded by a target JVM.

    The usage of this is available on the page above

  2. You can load an agent into the Target VM Using Sun's Library located in the Tools.jar file in your java lib directory.

Loading an agent looks like this:

/**
 * Hooks/Attaches to the process's VM with a given PID
 * 
 * @return true if the hook/attach was successful
 */
public boolean hook() {
    try {
        return (vm = VirtualMachine.attach(Long.toString(pid))) != null;
    } catch (AttachNotSupportedException | IOException e) {
        e.printStackTrace();
        return false;
    }
}

/**
 * Loads a thread agent which can debug the running threads and classes of this process
 * @param agent - the location of the agent jar file
 * @param options - the options/arguments to pass into the agents main method
 */
public boolean loadAgent(String agent, String options) {
    try {
        vm.loadAgent(agent, options);
        return true;
    } catch (AgentLoadException | AgentInitializationException | IOException e) {
        e.printStackTrace();
        return false;
    }
}

An Agent Main Class looks like this...

Also, Do Note: When creating the agent jar, you Must specify Agent-main in the manifest file with the location of the Class containing agentmain method for the agent to load from.

public class Agent {
    /**
     * An Object Lock for thread sync's if neccessary */
    public static final Object LOCK = new Object();

   /**
    * Starts the agent with this agent main
    * 
    * @param agentArgs
    *            - the agent args being passed into this method
    * @param inst
    *            - the instrumentation instrument that is passed into this
    *            method
    */
    public static void agentmain(String agentArgs, Instrumentation inst) {
          //Do whatever you want to the target VM here, hacky, but eh, use at your own risk, it is included in java itself...
    }
}

Description of An Agent Manifest file

Manifest-Version: 1.0
Agent-Class: packageNameThatWillBeDifferent.Agent
Created-By: 1.8.0_101 (Oracle Corporation)

No matter if this is an "enterprise" application or a small service doing its service in a JVM. You need to get insight into the basic health details of this garbage collected runtime. That is memory pools (heap, off-heap), GC statistics most prominently.

Just collecting process metrics from the host view (CPU usage, resident set size (RSS), IO) won't help you really getting any understanding of what the JVM does, nor where the hot spots in your code are.

If you don't have access to the code then an agent might be the only chance to get JVM insights. Otherwise you should really be instrumenting the JVM with some prominent Java metrics library as mentioned above.

From my view the most widely used metrics library for Java is Dropwizard Metrics .

Allthough I am a bit biased, I'd recommend having a look at the Micrometer project. You can configure a basic set of JVM runtime metrics in a few lines of code to get a basic understanding of your JVMs runtime behaviour. If that is done you can start instrumenting hot spots in your code by timing them. Micrometer provides a plethora of metric exporters for various established monitoring systems. (Prometheus, InfluxDb, Graphite, ...)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM