简体   繁体   中英

Java: Given Classloader and Class, get Class Bytecode

I have the following scenario, I have a class loader and a class it loaded, and now I need the bytecode for that class. Here is what I have tried so far:

    Field f = ClassLoader.class.getDeclaredField("classes");
    f.setAccessible(true);

    ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
    Vector<Class> classes =  (Vector<Class>) f.get(classLoader);

    for(Class loadedClass : classes)
    {
        String className = loadedClass.getName();
        String classFileResourcePath = "/" + className.replace(".", "/") + ".class";
        InputStream inputStream = classLoader.getResourceAsStream(classFileResourcePath);
        System.out.println(">>>> " + className + " => " + classFileResourcePath + " => " + inputStream);
    }

This code prints null for each class file. But when I change it to classLoader.getClass().getResourceAsStream(classFileResourcePath) it works if run in a standalone Main class in an IDE, but when I get to the actual context where this is needed, this returns null as well, presumably because there are "special" things happening with the jars and the classes behind the scenes. Without being able to discuss those details, it suffices to say what I have is a class and the class loader that loaded it, and now I need the byte code. How do I do this? If this is not possible in the Java layer, I may be able to fetch the original Jar itself and read it as a zip file, but that would be last resort.

There are actually several issues with your code sample:

First, you access the "classes" field of the java.lang.ClassLoader class to determine which classes are already loaded. This is a private field and if you let your code run in an environment where specialized class loaders are used (subclasses of java.lang.ClassLoader), you have more or less no idea what is contained in that field.

Using ClassLoader.getResourceAsStream, you prefix the path with an "/", which is not correct. ClassLoader.getResourceAsStream expects an absolute path and the path starts with the name of the first segment, eg use ClassLoader.getResourceAsStream("java/lang/ClassLoader.class") instead of ClassLoader.getResourceAsStream("/java/lang/ClassLoader.class") .

Using Class.getResourceAsStream , you can either provide an absolute path starting with "/", or provide a path relative to the relevant class, not starting with "/". Eg ClassLoader.class.getResourceAsStream("ClassLoader.class") or ClassLoader.class.getResourceAsStream("/java/lang/ClassLoader.class") will normally both give you access to the class' byte code.

Both approaches do however require that the class files are available as resources on the class path using the standard naming conventions for Java runtime environments. There is no requirement that a Java runtime environment must operate this way. Java classes may be generated dynamically, causing them to be known by the class loader, but not backed by persistent byte code. Proprietary class loaders are also not required to use the same mapping between class names and resource paths as the standard class loaders.

Java class loaders also do not offer a public API to access a class' byte code. If you separate the VM in a "native code part" and "Java code part", it is also quite obvious that the VM usually doesn't need a reference to the raw byte code from the "Java code part".

Relying on the conventions used by the standard class loaders, you can use your approach and it will mostly work in standalone applications. But as you've found out yourself, it may fail if you run the code in a different environment, eg when deployed to an application server or when using packaging frameworks like OSGi.

The preferred method is Class.getResource or Class.getResourceAsStream . This will automatically use the correct ClassLoader (or use ClassLoader.getSystemResource() if the ClassLoader is null ). It will also resolve the resource within the package of the class unless you prepend the resource name with a '/' .

So for a Class object not representing a nested class, you can request the associated resource using theClass.getResourceAsStream(theClass.getSimpleName()+".class")

If you need the correct handling of inner classes, you will get the qualified name via Class.getName() and transform it using either '/'+name.replace('.', '/')+".class") or name.substring(name.lastIndexOf('.')+1)+".class")

If this fails, the ClassLoader does not support getting the class bytecode or the class has been generated on-the-fly and added without recording the byte code in a way the ClassLoader could use.


If you want to be able to retrieve the byte code even for such classes, you need a JVM supporting Instrumentation . A ClassFileTransformer will get the byte code an input and hence may store it somewhere without actually transforming it, if that's the intent.

See also Instrumentation.getInitiatedClasses(java.lang.ClassLoader) for a reliable way to get the classes of a particular ClassLoader .

However, you should be aware that this is not necessarily the byte code as passed to defineClass as the JVM might strip information irrelevant for the execution and also store the data in an optimized form creating an equivalent but not exactly matching byte code when transforming it back for passing it to the transformer.

The other caveat is that if there are other transformers registered within the JVM, eg if your using an instrumenting profiler at the same time, you haven't precise control over the order of the transformers. Ie the first transformer will see byte code equivalent to the code stored on disk while the last of the chain will see code equivalent to the one finally executed by the JVM, while an in-between transformer sees something which might match neither of them.

Note that even with getResourceAsStream the byte code doesn't need to match, eg if the underlying resource has been modified since defineClass has been called. And in principle, ClassLoader s are not enforced to implement loadClass / findClass and getResourceAsStream in a consistent way.

As mentioned by @jarnbjo, this is not a generally working approach. I was looking for a generic approach. I've found two promising approach and only one actually working approach:

a. Instrumentation API. This works. I have decided not to use it because of difficulties when trying to modify some classes. The instrumentation agent runs in the same JVM and when it tries to instrument the classes it depends on, some very weird exceptions may occur. (I've learned some new exception types. Ehm, java.lang.ClassCircularityError...)

But this is likely to be OK for you if you admit adding an instrumentation agent (via JVM args) when the JVM starts. You seems to need only reading of the bytecode, so you should never get such troubles

b. JDI, Java Debug Interface. This looked very promising. I've started writing a script that reconstructs the bytecode from the JDI API. There was almost everything I needed, except the exception table. So it is not very useful. If you have all the instructions, but don't have the ExceptionTable attribute, you can't do any flow analysis, decompile the source and so on. Some exception handlers will look like a dead code without the ExceptionTable. You can just see the current position in the bytecode, without some important information.

You should have a look at the ASM library .

With the library you can access the bytecode like this:

ClassReader cr = new ClassReader("java.lang.Runnable");
ClassNode cn = new ClassNode();
cr.accept(cn, 0);

Then you can access the information object based by using the getters of the ClassNode . An event based analysis using visitors is also possible.

Note that you can instantiate the ClassReader with an input stream or a byte array instead of the class name as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM