Java: Do BOTH the compiler AND the JRE require access to all 3rd-party class files?

Question

I have 15 years' C++ experience but am new to Java. I am trying to understand how the absence of header files is handled by Java. I have a few questions related to this issue.

Specifically, suppose that I write source code for a class 'A' that imports a 3rd-party class 'Z' (and uses Z). I understand that at compile-time, the Java compiler must have "access" to the information about Z in order to compile A.java, creating A.class. Therefore, either Z.java or Z.class (or a JAR containing one of these; say Z.jar) must be present on the local filesystem at compile time - correct?

Does the compiler use a class loader to load Z (to reiterate - at compile time)?

If I'm correct that a class loader is used at COMPILE time, what if a user-defined class loader (L) is desired - and is part of the project being compiled? Suppose, for example, that L is responsible for downloading Z.class AT RUNTIME across a network? In this scenario, how will the Java compiler obtain Z.class at compile time? Will it attempt to compile L first, and then use L at compile time to obtain Z?

I understand that using Maven to build the project, Z.jar can be located on a remote repository over the internet at compile time - either on ibiblio, or on a custom repository defined in the POM file. I hope I'm correct that it is MAVEN that is responsible for downloading the 3rd-party JAR file at compile time, rather than the compiler's JVM?

Note, however, that at RUNTIME, A.class again requires Z.class - how will JRE know where to download Z.class from (without Maven to help)? Or is it the developer's responsibility to ship Z.class along with A.class with the application (say in the JAR file)? (...assuming a user-defined class loader is not used.)

Now a related question, just for confirmation: I assume that once compiled, A.class contains only symbolic links to Z.class - the bytecodes of Z.class are not part of A.class; please correct me if I'm wrong. (In C++, static linking would copy the bytes from Z.class into A.class, whereas dynamic linking would not.)

Another related question regarding the compilation process: once the necessary files describing Z are located on the CLASSPATH at compile time, does the compiler require the bytecodes from Z.class in order to compile A.java (and will build Z.class, if necessary, from Z.java), or does Z.java suffice for the compiler?

My overall confusion can be summarized as follows. It seems that the full [byte]code for Z needs to be present TWICE - once during compilation, and a second time during runtime - and that this must be true for ALL classes referenced by a Java program. In other words, every single class must be downloaded/present TWICE. Not a single class can be represented during compile time as just a header file (as it can be in C++).

Answer 1

Does the compiler use a class loader to load Z (to reiterate - at compile time)?

Almost. It uses a JavaFileManager which acts like a class loader in many ways. It does not actually load classes though since it needs to create class signatures from .java files as well as .class files.

I hope I'm correct that it is MAVEN that is responsible for downloading the 3rd-party JAR file at compile time, rather than the compiler's JVM?

Yes, Maven pulls down jars, although it is possible to implement a JavaFileManager that behaves like a URLClassLoader . Maven manages a local cache of jars, and will fill that cache from the network as needed.

Another related question regarding the compilation process: once the necessary files describing Z are located on the CLASSPATH at compile time, does the compiler require the bytecodes from Z.class in order to compile A.java (and will build Z.class, if necessary, from Z.java), or does Z.java suffice for the compiler?

It does not require all bytecode. Just class, method, and property signatures and metadata. If A depends on Z, that dependency can be satisfied by a Z.java found on the source path, on a Z.class found on any of the (class path, system class path), or via some custom extension like a Z.jsp.

My overall confusion can be summarized as follows. It seems that the full [byte]code for Z needs to be present TWICE - once during compilation, and a second time during runtime - and that this must be true for ALL classes referenced by a Java program. In other words, every single class must be downloaded/present TWICE. Not a single class can be represented during compile time as just a header file (as it can be in C++).

Maybe an example can help clear this up. The java language specification requires the compiler do certain optimizations. Inlining of static final primtives and String s.

If class A depends on B only for a constant:

class B {
  public static final String FOO = "foo";
}

class A {
  A() { System.out.println(B.FOO); }
}

then A can be compiled, loaded, and instantiated without B.class on the classpath. If you changed and shipped a B.class with a different value of FOO then A would still have that compile time dependency.

So it is possible to have a compile-time dependency and not a link-time dependency.

It is, of course, possible to have a runtime dependency without a compile-time dependency via reflection.

To summarize, at compile time, the compiler makes sure that the methods and properties a class accesses are available.

At class load time (runtime) the byte-code verifier checks that the expected methods and properties are really there. So the byte-code verifier double checks the assumptions the compiler makes (except for inlining assumptions such as those above).

It is possible to blur these distinctions. Eg JSP uses a custom classloader that invokes the java compiler to compile and load classes from source as needed at runtime.

Answer 2

The best way to understand how Maven fits into the picture is to realize that it (mostly) doesn't.

Maven is NOT INVOLVED in the processes by which the compiler finds definitions, or the runtime system loads classes. The compiler does this by itself ... based on what the build-time classpath says. By the time that you run the application, Maven is no longer in the picture at all.

At build time, Maven's role is to examine the project dependencies declared in the POM files, check versions, download missing projects, put the JARs in a well known place and create a "classpath" for the compiler (and other tools) to use.

The compiler then "loads" the classes that it needs from those JAR files to extract type signature information in the compiled class files. It doesn't use a regular class loader to do this, but the basic algorithm for locating the classes is the same.

Once the compiler has done, Maven then takes care of packaging into JAR, WAR, EAR files and so on, as specified by the POM file(s). In the case of a WAR or EAR file, all of the required dependent JARs packaged into the file.

No Maven-directed JAR downloading takes place at runtime. However, it is possible that running the application could involve downloading JAR files; eg if the application is deployed using Java WebStart. (But the JARs won't be downloaded from a Maven repository in this case ...)

Some more things to note:

Maven does not need to be in the picture at all. You could use an IDE to do the building, the Ant build tool (maybe with Ivy), Make or even "dumb" shell scripts. Depending on the build mechanism, you may need to handle external dependencies by hand; eg figuring out with external JARs to download, where to put them and so on.
The Java runtime system typically has to load more than the compiler does. The compiler only needs to load those classes that are necessary to type-check the classes that are being compiled.
For example, suppose class A has a method that uses class B as a parameter, and class B has a method that uses class C as a parameter. When compiling A , B needs to be loaded, but not C (unless A directly depends on C in some way). When executing A , both B and C needs to be loaded.
A second example, suppose that class A depends on interface I with implementations IC1 and IC2 . Unless A explicitly depends on IC1 or IC2 , the compiler does not need to load them to compile A .
It is also possible to dynamically load classes at runtime; eg by calling Class.forName(className) where className is a string-valued expression.

You wrote:

For the example in your second bullet point - I'd think that the developer could choose to provide, at compile time, a stub file for B that does not include B's method that uses C, and A would compile just fine. This would confirm my assessment that, at compile time, what might be called "header" files with only the necessary functions declared (even as stubs) is perfectly allowed in Java - so it's just for convenience/convention that tools have evolved over time not to use a header/source file distinction. (Correct me if I'm wrong.)

It is not a convenience / evolutionary thing. Java has NEVER supported separate header files. James Gosling et al started from the position that header files and preprocessors were a bad idea.

Your hypothetical stub version of B would have to have all of the visible methods, constructors and fields of the real B , and the methods and constructors would have to have bodies. The stub B wouldn't compile otherwise. (I guess in theory, the bodies could be empty, return a dummy value or throw an unchecked exception.)

The problem with this approach is that it would be horribly fragile . If you made the smallest mistake in keeping the stub and full versions of B in step, the result would be that the class loader (at runtime) would report a fatal error.

By the way, C and C++ are pretty much the exception in having separate header files. In most other languages that support separate compilation (of different files comprising an application), the compiler can extract the interface information (eg signatures) from the implementation source code.

Answer 3

One other piece to the puzzle which may help, interfaces and abstract classes are compiled to class files as well. So when compiling A, ideally you would be compiling against the API and not necessarily the concrete class. So if A uses interface B (which is implemented by Z) at compile time you would need classfiles for A & B but at runtime you would need class files for A, B and Z. You are correct that all classes are dynamically linked (You can crack the files and look at the bytecode and see the fully qualified names in there. jclasslib is an excellent utility for inspecting class files and reading bytecode if you're curious). I can replace classes at runtime. But problems at runtime often result in various forms of LinkageErrors

Often the decision on should a class be shipped with your compiled jar files, depends on your particular scenario. There are classes that are assumed to be available in every JRE implementation. But if I have my own API and implementation I would have to somehow provide both to wherever they are run. There are some APIs though, for example servlets where I would compile against the servlet API, but the container (eg Websphere) is responsible for providing the servlet API and implementation at runtime for me (therefore I shouldn't ship my own copies of these).

Answer 4

I have 15 years' C++ experience but am new to Java.

The biggest challenge you are likely to face is that many things which are treated as important in C++, such as the sizeof() an object, unsigned integers and destructors, are not easy to do in Java and are not treated with the same importance and have other solutions/work arounds.

I am trying to understand how the absence of header files is handled by Java. I have a few questions related to this issue.

Java has interfaces which are similar in concept to header files in the sense that they contain only declarations (and constants) without definitions. Classes are often paired with an interface for that class, sometimes one to one.

Does the compiler use a class loader to load Z (to reiterate - at compile time)?

When a class loader loads a class, it calls the static initialisation block, which can do just about anything. All the compiler needs is to extract meta-data from the class, not the byte code and this is what it does.

it is MAVEN that is responsible for downloading the 3rd-party JAR file at compile time, rather than the compiler's JVM?

Maven must load the file to a local filesystem, the default locations is ~/.m2/repository

how will JRE know where to download Z.class from (without Maven to help)?

Its must either use Maven; Some OSGi containers are able to load and unload different versions dynamically, for example you can change the version of a library in a running system, or update a SNAPSHOT from a maven build.

Or you have a stand alone application; Using a Maven plugin like appassembly you can create batch/shell script and a directory with a copy of all the libraries you need.

Or a web archive war which contains meta informations and many jars inside it. (It just a jar containing jars ;)

Or is it the developer's responsibility to ship Z.class along with A.class with the application

For a standalone application yes.

Now a related question, just for confirmation: I assume that once compiled, A.class contains only symbolic links to Z.class

Technically, it only contains strings with Z in them, not the .class itself. You can change alot of the Z without compiling A again and it will still work. eg you might compile against once version of Z and replace it with another version later and the application can still run. You can even replace it while the application is running. ;)

the bytecodes of Z.class are not part of A.class;

The compiler does next to no optimisations. The only significant one IMHO, is that it inlines compile time constants. This means if you change a constant in Z after compiling A it may not change in A. (If you make the constant not known at compile time it won't inline it)

No byte-code is inlined, native code from the byte code is inlined at at runtime based on how the program actually runs. eg say you have a virtual methods with N implementations. A C++ compiler wouldn't know which ones to inline esp as they might not be available at compile time. However the JVM can see which ones are used the most (it collects stats as the program runs) and can inline the two most commonly used implementations. (Food for thought as to what happens when you remove/update one of those classes at runtime ;)

please correct me if I'm wrong. (In C++, static linking would copy the bytes from Z.class into A.class, whereas dynamic linking would not.)

Java has only dynamic linking but this doesn't prevent inlining of code at runtime which is as efficient as using a macro.

Another related question regarding the compilation process: once the necessary files describing Z are located on the CLASSPATH at compile time, does the compiler require the bytecodes from Z.class in order to compile A.java (and will build Z.class, if necessary, from Z.java), or does Z.java suffice for the compiler?

The compiler will compile all .java files as required. You need only provide the .java but it must compile (ie. its dependancies must be available) However if you use a .class file not all of its dependancy need to be available to compile A.

My overall confusion can be summarized as follows. It seems that the full [byte]code for Z needs to be present TWICE - once during compilation, and a second time during runtime -

Technically a class contains byte-code and meta-data such a method signatures, fields and constants. None of the byte-code is used at compile time, only the meta-information. The byte-code at compile time does not need to match what is used at runtime. (The signatures/fields used do) It is just simpler to have one copy of each class, but you could use a stripped down version at compile time if you needed to for some purpose.

and that this must be true for ALL classes referenced by a Java program. In other words, every single class must be downloaded/present TWICE. Not a single class can be represented during compile time as just a header file (as it can be in C++).

It only needs to be downloaded once as it sits in a repository or somewher on you disk. The interfaces like the headers may be all you need at compile time and these could be a seperate library, but typically it is not as it is just simpler to have a single archive in most cases (OSGi is the only example I know of where it is worth seperating them)

Answer 5

Your summary is correct, however I would like to add that if you compile to a jar, then the jar will contain Z ( and if Z is a jar only the files inside the Z jar that are needed.

However the same Z can be used for both compile and runtime.

Answer 6

Simply put, no. If you look at say JDBC code it is compiled against an interface, which for this purpose acts like a header file, and uses reflection to pull in the right implementation at runtime. The drivers do not need to be present at all on the build machine, though these days a cleaner way to do this kind of thing is via a dependency injection framework.

In any case, there's nothing stopping you from compiling against one 'header' class file and then running against the actual class file (Java is mostly dynamically linked) but this just seems to be making extra work for yourself.

Java: Do BOTH the compiler AND the JRE require access to all 3rd-party class files?

Question

6 answers

solution1
3 ACCPTED 2011-04-03 01:33:44

solution2
3 2011-04-03 01:46:34

solution3
1 2011-04-03 01:35:01

solution4
1 2011-04-03 06:54:47

solution5
0 2011-04-03 01:11:13

solution6
0 2011-04-03 01:46:14

Java: Do BOTH the compiler AND the JRE require access to all 3rd-party class files?

Question

6 answers

solution1 3 ACCPTED 2011-04-03 01:33:44

solution2 3 2011-04-03 01:46:34

solution3 1 2011-04-03 01:35:01

solution4 1 2011-04-03 06:54:47

solution5 0 2011-04-03 01:11:13

solution6 0 2011-04-03 01:46:14

solution1
3 ACCPTED 2011-04-03 01:33:44

solution2
3 2011-04-03 01:46:34

solution3
1 2011-04-03 01:35:01

solution4
1 2011-04-03 06:54:47

solution5
0 2011-04-03 01:11:13

solution6
0 2011-04-03 01:46:14