Compatibility break number two (of n?)

A while ago I spent a day and a half fretting over a missing checkcast in a Java/JDK file before I finally solved it. Before I finally mostly solved it.

It didn't take long until checkcast returned to hit me again from another angle.

The Java List<E> includes a method called toArray(), which returns the contents of the list as an array. toArray() is older than Java's generics, so it returns Object[] rather than an E[]. This isn't a problem on its own, because implementations of List<E> are free to return an E[].

The next part of the puzzle is ArrayList, which implements toArray() and returns an Object[]. It doesn't have to do that, but the source code uses Object[] for storage rather than E[]. The constructors could have called new E[], they do call new Object[].

ArrayList.toArray() calls Arrays.copyOf(), so the object it returns actually is an Object[]. The third, and critical, part of the puzzle is… is all over the JDK. ArrayList is used all over the JDK, and code equivalent to String[] a = new ArrayList<String>().toArray() occurs in many places, and of course works with Sunracle's JVMs.

Sometimes the JDK code includes a checkcast (but no exception handling), sometimes there's no checkcast. Either way it works… except for me.

What to do? On one hand this detail is very, very difficult for me, because it's a minor side effect of extremely important invariants in my compiler. On the other, I cannot very well refuse to compile the JDK and its two thousand instances of ArrayList, and right now this issue is making every application fail during startup.

I decided (after writing this and pacing the corridor for a good long while) to make checkcast copy the array, if the array's contents are acceptable to the new type. That seems to be the least bad option I have, but of course it means that == behaves differently from real JVMs, because the cast actually returns a copy.

At present the Object[] and the copy share the same hashCode(). I'm not sure whether that's a good idea.

I break compatibility

What is java? In a way, my compiler defines java as the language used to write the twenty thousand classes in the JDK library, Jsoup or whatever else gradle fetches to build a test. While I do read the specification, test-driven development is called test-driven for a reason.

A test drove me into a problem yesterday and my head hurts. I've encountered that problem before but escaped for various reasons, this time I have to confront it. The problem involves a one-line function that just accepts an argument, casts it to a subclass and returns it. Javac has compiled that as aload_1; areturn, which means push the first argument onto the stack and then return it. Javac would ordinarily include a checkcast to make sure that that first argument actually has the function's return type, but didn't in this particular case.

Taken together, this tiny function and its callers convert an Object to more-specific class without type checking. The caller gets an arbitrary Object from a call that returns String, and this is legal java. I'll skip the details of why this is legal, they just make me sad and angry. […More…]

static final volatile int foo = 1;

It's likely that my compiler treats foo as though it were just static final, and if you do not understand how a final volatile variable differs from a plain final, then congratulations.

The problem is that although people generally think of static final variables as constants, they aren't quite constant: foo is assigned its value early in the life of the application and can never be reassigned and, but if two or more classes reference each other, so that each class has to be initialised before the other, then code involved in that initialisation may see foo still being 0. This is usually an unpleasant surprise, found while debugging. volatile affects the visibility of the foo=1 assignment, and it probably forbids one of the things my compiler does to handle those loops.

I can fix it and I probably will, although I will not be highly motivated: It's a minor problem, and quite frankly, people whose code depends on the semantics of static final volatile ought to go for a walk in the park and reconsider that aspect of their design.

I write a benchmark

Today I need to write a benchmark.

I have decided to reimplement HTML Tidy in a simple, approximate way. I chose this because:

I don't know anything about how it works internally
… but I haven't heard that it's especially terrible
I have a suitable library, one called Jsoup
Jsoup is written in typical java style
Using jsoup requires little code

Of course I realise that this won't be a very good as a benchmark. I chose it because I believe that Jsoup is the kind of code I need to handle well. If my code doesn't do well on Jsoup, then I need to fix it. I don't have the same feeling about specint and other well-known benchmarks. They try to be good, precise benchmarks, and they don't explicitly try to be typical java.

As a result this benchmark will be useful for me, and the cost of that is that its measured performance is imprecise for you.

15000% improvement

I've been working on something that, I think, ought to perform 30%-200% better than the current solutions, depending on the workload and how the measurement mixes values for average/typical/best-case/worst-case time and space. Some workloads might get more than 200% improvement, and I hope less than 30% will be rare.

The very first time I tried to measure the entire system, however, my measured result wasn't 30% improvement or 200%, it was a little over 15000%. Three zeroes. And I wasn't even trying to make a misleading benchmark — I just wanted to measure how well my code worked and I suppose I unconsciously concentrated on parts where the differences should show up clearly.

15000% certainly is a clear difference.

It's also totally meaningless. It measures something you'll never, ever do. However, it's a fantastically big number, I know that I was honest, and it has taught me that even if benchmark results are laughably unrealistic, it's not always because someone tried to brag or mislead. Maybe they are just myopic. Focused on the details of their work.

I've spoken harsh words about other people's benchmarks in the past. I don't think I will do that any more.

Update: 1500000‱.

Update: And the best way to represent it is with a logarithmic bar graph. Most people do not really understand a log scale, but the difference still looks very large and so the meaning is preserved. The label on the y axis makes it formally correct.