Symbolic references in Java

Question

In these days I have been playing with Java reflection and .class format. I'm currently studying ldc instruction.

In JVM Specification I found term I don't understand: symbolic reference , and I have the following questions.

What does it mean?
Where is it used?
In which cases does the ldc instruction load a symbolic reference?
Is there any code in Java corresponding to that action?

Answer 1

It would be helpful if you would quote the exact piece of the documentation that's giving you trouble. Since you haven't, I'm going to take a guess at what you might have quoted, from the doc for ldc :

Otherwise, if the run-time constant pool entry is a symbolic reference to a class (§5.1), then the named class is resolved (§5.4.3.1) and a reference to the Class object representing that class, value, is pushed onto the operand stack.

Otherwise, the run-time constant pool entry must be a symbolic reference to a method type or a method handle (§5.1). ...

This quote has a link to another section of the JVM spec (5.1), which describes the run-time constant pool:

a run-time data structure that serves many of the purposes of the symbol table of a conventional programming language implementation

What this means is that the run-time constant pool contains information about the pieces of a class in symbolic form: as text values.

So, when ldc is given a "symbolic reference" to a class, it's given the index of a CONSTANT_Class_info structure within the constant pool. If you look at the definition of this structure, you'll see that it contains a reference to the name of the class, also held within the constant pool.

TL;DR: "symbolic references" are strings that can be used to retrieve the actual object.

An example:

if (obj.getClass() == String.class) {
    // do something
}

Becomes the following bytecode:

aload_1
invokevirtual   #21; //Method java/lang/Object.getClass:()Ljava/lang/Class;
ldc     #25; //class java/lang/String
if_acmpne       20

In this case, the ldc operation refers to a class that is stored symbolically. When the JVM executes this opcode, it will use the symbolic reference to identify the actual class within the current classloader, and return a reference to the class instance.

Answer 2

To add to the other answer for parts 1&2 only of the question:

Currently, the JVM uses run-time data areas that can be divided into six areas: • The program counter (PC) register • Java Virtual Machine (JVM) stacks • Native method stacks • Heap Area • Method area • Run-Time Constant Pool

The PC register is used to store the address of the next instruction, which is the instruction code to be executed. The execution engine reads the next instruction and the JVM uses this to keep track of the executions of threads, because the CPU will constantly switch between them.

The stack frame has three parts: the Local Variable Array, the Operand Stack and the Frame Data.

The purpose of the Operand Stack is for any intermediate operations that may be required, such as addition or subtraction of numbers. The operand stack acts as runtime workspace to perform the operation.
The Local Variable Array contains all parameters and local variables of the method.
Frame Data: contains all symbolic references (constant pool resolution) and normal method returns related to that particular method.

Native Stacks are used when an implementation of the Java Virtual Machine is using conventional stacks, colloquially called "C stacks," to support native methods (methods written in a language other than the Java programming language).

The heap is the run-time data area from which memory for all class instances and arrays is allocated.

The Java Virtual Machine has a method area that is shared among all Java Virtual Machine threads. The method area is analogous to the storage area for compiled code of a conventional language or analogous to the "text" segment in an operating system process. It stores per-class structures such as the run-time constant pool, field and method data, and the code for methods and constructors, including the special methods used in class and interface initialization and in instance initialization (§2.9).

A run-time constant pool is a per-class or per-interface run-time representation of the constant_pool table in a class file (§4.4)

So what does that mean in practical terms? My understanding is that it is an abstraction to reference classes without any side effects.

For example, what if you were using the Class object instead of the constant pool symbols for storing class structures at runtime and it didn't load properly? I've seen a code base with multiple jars of the same name, type, and version all loaded from different areas of the code, because multiple teams worked on separate sections of a large code base and didn't bother to clean up the mess. I also worked with tuples of strings as a means of constructing a unique location in a transportation program. It worked, but it seemed to be a clunky mechanism for handling the use case. So my point is, the symbolic representations for classes and interfaces fill that need and create a standard procedure to reference classes during runtime, without side effects, to my knowledge.

To quoteBrian Goetz on this:

Activities such as bytecode generation have a frequent need to describe constants such as classes. However, a Class object is a poor description for an arbitrary class. Producing a Class instance has many environmental dependencies and failure modes; loading may fail in because the desired class does not exist or may not be accessible to the requestor, the result of loading varies with class loading context, loading classes has side-effects, and sometimes may not be possible at all (such as when the classes being described do not yet exist or are otherwise not loadable, as in during compilation of those same classes, or during jlink-time transformation.) So, while the String class is a fine description for a Constant_String_info, the Class type is not a very good description for a Constant_Class_info.

A number of activities share the need to deal with classes, methods, and other entities in a purely nominal form. Bytecode parsing and generation libraries must describe classes and method handles in symbolic form. Without an official mechanism, they must resort to ad-hoc mechanisms, whether descriptor types like ASM's Handle, or tuples of strings (method owner, method name, method descriptor), or ad-hoc (and error-prone) encodings of these into a single string. Bootstraps for invokedynamic that operate by spinning bytecode (such as LambdaMetafactory) would prefer to work in a symbolic domain rather than with live classes and method handles. Compilers and offline transformers (such as jlink plugins) need to describe classes and members for classes that cannot be loaded into the running VM. Compiler plugins (such as annotation processors) similarly need to describe program elements in symbolic terms. They would all benefit from having a single, official way to describe such constants.

The Java docs quoted and elaborated on were fromJava SE 12 .

There are diagrams to illustrate this: the JVM Architecture from Dzone.

Last of all is An Introduction to the Constant Pool in the JVM by Baeldung that illustrates a simple Hello World program with bytecode.

Symbolic references in Java

Question

2 answers

solution1
17 ACCPTED 2013-07-01 14:13:26

solution2
0 2022-08-29 19:51:46

Symbolic references in Java

Question

2 answers

solution1 17 ACCPTED 2013-07-01 14:13:26

solution2 0 2022-08-29 19:51:46

solution1
17 ACCPTED 2013-07-01 14:13:26

solution2
0 2022-08-29 19:51:46