Arnt Gulbrandsen
About meAbout this blog

Live code for a small main()

This is the simplest possible java main(): int main(String[] argv){return 0;}. How many classes, constructors and methods does it require?

The simple answer, of course, is the String constructor and the String[] constructor, so that's two functions.

But it also requires whatever those two functions require, which is where the fun begins. The String constructor requires the String class, which has some members that may throw exceptions, which brings the Throwable class into the VM, and the Throwable class has a static initialiser (a function that's called automatically when the class is first accessed).

That static initialiser does quite a few things and requires many other classes. 19 classes directly, more indirectly. But is it really, really required? That main() won't actually throw any exceptions, will it? So Throwable as a class is necessary to verify that the live functions in String make sense, but as long as Throwable is only named, not used, its static initialiser could be ignored.

Could be. I'll return to Throwable's liveness or lack of it below.

The String constructor also requires a bit of code to construct java's unicode strings from the encoding/charset supplied by the unix/linux kernel, and that's not debatable in the way Throwable is. The charset conversion involves the Properties class, several java.util.charset.* classes, a Factory or two and calling a ClassLoader for a computed class name. One of those classes includes a data structure which needs to be usable in multithreaded code, so the VM now includes not only java.util.charset.*, but also ConcurrentHashMap, ThreadGroup, AtomicInteger, sun.misc.Unsafe and more. The VM is up to around 100 classes so far, some of which might be ignored based on sufficiently clever liveness analysis.

Somewhere along the line, a few of the live functions call toString() on Object arguments. (For those who don't know Java, toString() generates a string representation of an object, mostly for debugging. Typical implementations of it dump the important part of the object's internal state. Every class inherits Object, so if Object.toString() is known to be called, then every class' toString() is potentially called.)

ToString() often makes makes most of that class' fields and getter/setter pairs live, and the liveness analysis in the previous paragraph loses much of its effect: If a class is instantiated, then most fields and methods in it are live because of toString(). This happens a lot — clever analysis reduces some sort of bloat, then some other code comes along and adds the bloat back.

I cannot remember details, but something actually uses a class/template name parser. This is a real dependency, the parser actually is used for this main(). That parser throws exceptions, even in the simplest possible case, so if Throwable seemed unnecessary earlier, now it is undeniably live. As are many of the 19 classes Throwable requires, and many of the classes required by those 19.

Most error handling uses exceptions, but some of the functions called by that main() report errors by writing to System.err (stderr in unix terms). Those errors may not happen, but they're tested and handled, and the handling requires System, FileDescriptor and quite a few other classes, as well as functions ranging from PrintStream.println() via Charset.fromUnicode() to FileOutputStream.writeBytes(), and many helpers.

We now need nearly 300 classes and, depending on how you count template instantiations, 1000-5000 functions. Roughly 5000 objects need to be instantiated in order to create the strings in that argc. Or more, or less, depending on how advanced the liveness analysis is.

To take a random example, it's surprisingly difficult to determine whether Class<String[]>.ɡetConstructors() really is live in that one-line main(). Class<T>.getConstructors() is live (it's called while creating the correct whatever-to-unicode converter) but that doesn't mean that getConstructors() is live for every value of T.

A naïve liveness analyser might assume this doesn't matter, because getConstructors() is effectively the same function for every value of T. Unfortunately it's not that simple. There are also some methods that take Object arguments and then use if(foo instanceof …){…} to behave differently, activating whole cascades of classes. When my liveness analyser was that naïve, it told me that the main() above requires dozens of TLS- and X509-related classes, because of code like if(socket instanceof SSLSocket) {…}.

I don't have a real count of either classes, constructors or methods. The number of functions that need to be available at runtime is astonishingly large, that's all I can say, even when the number of functions actually called is very small.


Come back Microsoft, all is forgiven

A year ago I talked at some length and frequency about the evils of Microsoft's reference application for the Xbox. One of the points I mentioned most often is that the thing links in four different JSON libraries, all deficient in some serious manner.

Today I added a third JSON library to an application, despite knowing that it already used two different ones.


Integer variables in Modula-3

Modula 3 is perhaps my favourite language. It has (had — it's practically dead now) most of what I like about java, most of what I like about c++, most of what I like about modula 2, and some unique features.

One of its little greatnesses is in the integer type system.

In modula 3, an unsigned integer is 31 or 63 bits long (as I recall, there are two unsigned integer types, although a tutorial I found on the web now mentions only one). Signed integers are 32 or 64 bits, so if a is a signed integer, a=b always works, regardless of whether b is signed or unsigned.

a=b does not throw exceptions, a=b; b=a does not change the value of b, and the cost of that is merely that you have to start using 64-bit variables at 2147483648 instead of at 4294967296.

Update: I want to expand on that, and compare it to java. The language designers of both java and modula3 understood that confusion or sloppiness with regard to signed and unsigned integers was a significant source of bugs in c/c++. Java solved it by not having unsigned integers, modula3 solved it by reducing their bit width by 1.5-3%.

I have seen many java programs that either output long to formats or protocols where only unsigned numbers are legal, or that read into long when the format clearly says 64 bits unsigned, so I think the java solution isn't very good. They chopped off the problematic feature and instead people use a misfit type. Sometimes it works: In this example it likely works because the sender too, uses a java long, so the so-called 64 bits unsigned ID is really 63-bit. I am not sure whether this kind of bug is preferable to the kind of signed/unsigned bugs in classic c.

Modula3, on the other hand, made the less obvious choice of leaving one bit at zero. The CPU registers are 32 or 64 bits wide, modula3 restrains programs to using 31 or 63 bits. As a result, programmers can still express the unsigned nature of many numbers using the type system, without sign problems. Subtle, well-considered, 97% good.


Fault tolerant programs and programmers

Archiveopteryx git head crashes a bit. Not every day, but some people reports that it crashes every week or month, at random times. Clearly there is a bug. Abhijit and I have discussed it and found a way to contain it, and I've written the code.

But I haven't found a way to push the fix to the master tree. I seem unable to commit and push that code. My soul wants to find the bug and fix it, not contain it.

Meanwhile, I had an appointment with the dentist this morning.

In the waiting room I read a fascinating blog post about a Chromium exploit. Sergey Glazunov, clearly an admirably clever hacker, stitched together fourteen bugs, quirks and missed hardening opportunities to form a critical exploit. The bugtracking information for one of the bugs shows that it was reported, discussed for a few days, then it was idle until Sergey leveraged it, and then it was fixed.

Chromium is a nice browser, and I appreciate the hardening and exploit resistance the team has added. I particularly appreciate the team's honesty: They run their pwnium contests and are frank about the results.

But now I am even less happy about making fault tolerant code. I feel that it may be mentally difficult to make a program tolerate faults and at the same time make a programmer not tolerate faults.


for() is evil

Consider the function Message::acceptableBoundary(). That function's reading order is exactly the same as the its execution order. This is not unusual in C and C++ (and more or less in Java), but there is a significant exception, for(). (more…)


catch( Exception e ) { throw new Exception( e ); }

Some Java book I read long ago, I think it was Thinking in Java, explains that one of the benefits of Java exceptions is that you can shift error handling away from the normal path, leaving the implementation of the common case clearer and better.

Fine. There's just one catch: You have to catch the exceptions and handle the error somewhere. (more…)


Making Maven compile faster

jam -g is the best make system I've ever used. Best for the simple reason that when the build fails, it usually fails quickly. I start the build, and a second later I'm already looking at my mistake. That feature outweighs any and all drawbacks.

Sadly, I don't use jam very often at the moment. I mostly use maven 2, which starts the build by determining from first principles which source files to use and which libraries to download. In practice the set needed hasn't changed in the past minutes, (more…)



Javadoc is built in to Java, but I think they botched it. It's clear that they didn't care deeply: JLS3 grammar doesn't mention javadoc at all, and the JLS doesn't specify it, hardly even mentions it… the word stepchild sounds more appropriate than does builtin, in my oh-so-humble opinion.

There are many things I don't like about the result, and few things I do like.

There's too much typing for not enough benefit. (more…)