简体   繁体   中英

How to speed up runtime Java code instrumentation?

I made a Java agent which is attached to a JVM during runtime and instruments all the loaded project classes and inserts some logging statements. There are 11k classes in total. I measured the total time taken by the transform method of my ClassFileTransformer and it was 3 seconds. But the duration of the whole instrumentation process takes about 30 seconds. This is how I retransform my classes:

 instrumentation.retransformClasses(myClassesArray);

I assume most time is taken up by the JVM to reload changed classes. Is that right? How can I speed up the instrumentation process?

Update :
When my agent is attached,

instrumentation.addTransformer(new MyTransfomer(), true);
instrumentation.retransformClasses(retransformClassArray);

is called only once .

Then MyTransfomer class instruments the classes and measures the total duration time of instrumentation:


public class MyTransfomer implements ClassFileTransformer {
private long total = 0;
private long min = ..., max = ...;

public final byte[] transform(ClassLoader loader, String className, Class<?> classBeingRedefined, ProtectionDomain protectionDomain, byte[] classFileBuffer) {
   long s = System.currentTimeMillis();
   if(s < min) min = s;
   if(s > max) max = s;
   byte[] transformed = this.transformInner(loader, className, classFileBuffer);

   this.total += System.currentTimeMillis() - s;
   
   return transformed;
  }
}

After all the classes are instrumented(from the initial array) (a global cache keeps track of the instrumented classes) total is printed and it will be ~3 seconds. But max-min is ~30 seconds.

Update 2:

After looking at the stack trace this is what happens: I call

instrumentation.retransformClasses(retransformClassArray);

which calls the native method retransformClasses0() . After some time(!) the JVM calls the transform() method of the sun.instrument.InstrumentationImpl class(but this method takes only one class at a time, so the JVM calls this method multiple times consecutively), which calls transform() on the sun.instrument.TransformerManager object which has a list with the all the ClassTransformers registered and calls each of these transformers to transform the class( I have only one transformer registered!! ).

So to my opinion, most of the time is spent in the JVM (after retransformClasses0() is called and before each call to sun.instrument.InstrumentationImpl.transform() ). Is there a way to reduce the time needed by the JVM to carry out this task?

Correction:

Because the retransformClasses(classArr) will not retransform all the elements in the classArr at once, instead it will retransform each of them as needed(eg. while linking).(refer to the jdk [ VM_RedefineClasses ][1] and [ jvmtiEnv ][2]) , it does retransform all of them at once.

What retransformClasses() does:

  1. Transfer control to native layer, and give it a class list which we want to transform
  2. For every class to be transformed, the native code tries to get a new version by calling our java transformer, this leads to a transfer of control between the java code and native.
  3. The native code replace the appropriate parts of internal representation by the given new class version one another.

In step 1:

java.lang.instrument.Instrumentation#retransformClasses calls sun.instrument.InstrumentationImpl#retransformClasses0 which is a JNI method, the control will be transferred to native layer.

// src/hotspot/share/prims/jvmtiEnv.cpp
jvmtiError
JvmtiEnv::RetransformClasses(jint class_count, const jclass* classes) {
  ...
  VM_RedefineClasses op(class_count, class_definitions, jvmti_class_load_kind_retransform);
  VMThread::execute(&op);
  ...
} /* end RetransformClasses */

In step 2:

This step is implemented by KlassFactory::create_from_stream , this procedure will post a ClassFileLoadHook event whose callback can acquire the transformed bytecode by invoking the java transformer method. In this step, the control will switch back and forth between native code and java code.

// src/hotspot/share/classfile/klassFactory.cpp
// check and post a ClassFileLoadHook event before loading a class
// Skip this processing for VM hidden or anonymous classes
if (!cl_info.is_hidden() && (cl_info.unsafe_anonymous_host() == NULL)) {
  stream = check_class_file_load_hook(stream,
                                      name,
                                      loader_data,
                                      cl_info.protection_domain(),
                                      &cached_class_file,
                                      CHECK_NULL);
}
//src/java.instrument/share/native/libinstrument/JPLISAgent.c :
//call java code sun.instrument.InstrumentationImpl#transform
transformedBufferObject = (*jnienv)->CallObjectMethod(
   jnienv,
   agent->mInstrumentationImpl, //sun.instrument.InstrumentationImpl
   agent->mTransform, //transform
   moduleObject,
   loaderObject,
   classNameStringObject,
   classBeingRedefined,
   protectionDomain,
   classFileBufferObject,
   is_retransformer);

In step 3:

VM_RedefineClasses::redefine_single_class(jclass the_jclass, InstanceKlass* scratch_class, TRAPS) method replaces parts (such as constant pool, methods, etc.) in target class with parts from transformed class.

// src/hotspot/share/prims/jvmtiRedefineClasses.cpp
for (int i = 0; i < _class_count; i++) {
  redefine_single_class(_class_defs[i].klass, _scratch_classes[i], thread);
}

So how to speed up runtime Java code instrumentation?

In my project, the total time and max-min time are almost the same if the app is in a paused state while transforming. can you provide some demo code?

It's impossible to change the way jvm works, so multithreading may not be a bad idea. It got several times faster after using multithreading in my demo project.

From your description it seems like the complete transformation is running in a single thread.

You could create multiple threads, each one is transforming one class at the time. As the transformation of a class should be independent of any other class. This should give you an improvement in the overall transformation time by a factor of the number of used Core available on the executing system.

You can count the cores with:

int cores = Runtime.getRuntime().availableProcessors();

Chunk the list of classes to be transformed into the number of cores and create that may threads to process the chunks in parallel.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM