Arnt Gulbrandsen
About meAbout this blog
2018-09-19

Live code for a small main()

This is the simplest possible java main(): int main(String[] argv){return 0;}. How many classes, constructors and methods does it require?

The simple answer, of course, is the String constructor and the String[] constructor, so that's two functions.

But it also requires whatever those two functions require, which is where the fun begins. The String constructor requires the String class, which has some members that may throw exceptions, which brings the Throwable class into the VM, and the Throwable class has a static initialiser (a function that's called automatically when the class is first accessed).

That static initialiser does quite a few things and requires many other classes. 19 classes directly, more indirectly. But is it really, really required? That main() won't actually throw any exceptions, will it? So Throwable as a class is necessary to verify that the live functions in String make sense, but as long as Throwable is only named, not used, its static initialiser could be ignored.

Could be. I'll return to Throwable's liveness or lack of it below.

The String constructor also requires a bit of code to construct java's unicode strings from the encoding/charset supplied by the unix/linux kernel, and that's not debatable in the way Throwable is. The charset conversion involves the Properties class, several java.util.charset.* classes, a Factory or two and calling a ClassLoader for a computed class name. One of those classes includes a data structure which needs to be usable in multithreaded code, so the VM now includes not only java.util.charset.*, but also ConcurrentHashMap, ThreadGroup, AtomicInteger, sun.misc.Unsafe and more. The VM is up to around 100 classes so far, some of which might be ignored based on sufficiently clever liveness analysis.

Somewhere along the line, a few of the live functions call toString() on Object arguments. (For those who don't know Java, toString() generates a string representation of an object, mostly for debugging. Typical implementations of it dump the important part of the object's internal state. Every class inherits Object, so if Object.toString() is known to be called, then every class' toString() is potentially called.)

ToString() often makes makes most of that class' fields and getter/setter pairs live, and the liveness analysis in the previous paragraph loses much of its effect: If a class is instantiated, then most fields and methods in it are live because of toString(). This happens a lot — clever analysis reduces some sort of bloat, then some other code comes along and adds the bloat back.

I cannot remember details, but something actually uses a class/template name parser. This is a real dependency, the parser actually is used for this main(). That parser throws exceptions, even in the simplest possible case, so if Throwable seemed unnecessary earlier, now it is undeniably live. As are many of the 19 classes Throwable requires, and many of the classes required by those 19.

Most error handling uses exceptions, but some of the functions called by that main() report errors by writing to System.err (stderr in unix terms). Those errors may not happen, but they're tested and handled, and the handling requires System, FileDescriptor and quite a few other classes, as well as functions ranging from PrintStream.println() via Charset.fromUnicode() to FileOutputStream.writeBytes(), and many helpers.

We now need nearly 300 classes and, depending on how you count template instantiations, 1000-5000 functions. Roughly 5000 objects need to be instantiated in order to create the strings in that argc. Or more, or less, depending on how advanced the liveness analysis is.

To take a random example, it's surprisingly difficult to determine whether Class<String[]>.ɡetConstructors() really is live in that one-line main(). Class<T>.getConstructors() is live (it's called while creating the correct whatever-to-unicode converter) but that doesn't mean that getConstructors() is live for every value of T.

A naïve liveness analyser might assume this doesn't matter, because getConstructors() is effectively the same function for every value of T. Unfortunately it's not that simple. There are also some methods that take Object arguments and then use if(foo instanceof …){…} to behave differently, activating whole cascades of classes. When my liveness analyser was that naïve, it told me that the main() above requires dozens of TLS- and X509-related classes, because of code like if(socket instanceof SSLSocket) {…}.

I don't have a real count of either classes, constructors or methods. The number of functions that need to be available at runtime is astonishingly large, that's all I can say, even when the number of functions actually called is very small.