Arnt Gulbrandsen
About meAbout this blog

An experiment with C++11

Bjarne Stroustrup writes: C++11 feels like a new language: The pieces just fit together better than they used to and I find a higher-level style of programming more natural than before and as efficient as ever. I haven't written much C++ in the past three or four years, so this summer, recovering from an illness, I decided to try an educational hack.

I chose to reimplement a perl blogging engine in C++11, since I wanted something small, complete, useful and above all not C-like, and since I had a candidate: Loathsxome, a cleaned-up reimplementation of blosxom. Both are a few hundred lines, vaguely high-level, hackish in the way small programs can (legitimately) be, and they do a lot of string handling. The loathsxome installation that (still?) runs this blog is about 1100 lines of code including my plugins but excluding templates.

My goal was to write clean code, use some/many C++11 features and generally modern style, and get a comparable line count in C++ for the same features. I didn't aim for performance, but thought the result should perform well because I know big-O notation and the C++ committee has an efficiency fetish.

Code size: The line count is twice as high. I've about 2000 lines of C++ to do the job of 1000 perl lines.

I find the C++ more maintainable, but that may be because the C++ is better documented. It's actually 3000 lines including whitespace and comments, as opposed to 1100 for the perl.

C++ features: I've seen C++11 examples that made me wonder whether I was looking at real C++ or an elaborate prank. But after this experiment I quite like the new features.

The move operator and auto are great. Auto can be misused, there's no doubt about that, but it handles a common case well. (Often good, great potential for misuse — welcome to C++!)

C++ still provides only weak abstractions. The internals of everything are plainly visible, for good and bad. The internals of some things are astonishingly complex.

It's nice to have several compilers, with different diagnostics. Clang is good (so much better than javac).

I really like some of the STL features, but it's still difficult for me to accept the lack of x.contains(key) on the containers. There are workarounds (x.find(key) != x.end(), x[key]), but neither of those really express what I want to do.

The STL and the move operator eliminated delete in this case. I had written a thousand lines before I even noticed that I hadn't needed to delete anything. The only memory leak was related to using a C library.

The code is noticeably short on if/while/try, so I suppose I too find a higher-level style of programming more natural.

The string handling disappeared. Most of it turned into objects and classes. Loathsxome uses strings for everything, Plusxome starts with a Path and looks up or makes a Rendering, usually based on a Document containing a tree of Nodes and ultimately made from Posts. The STL string may still suck but I've reimplemented a thousand lines of string handling and not felt much pain, so C++11 isn't bad as I thought it would be for such tasks.

I had a look at Boost.Spirit for parsing the postings and templates, but recoiled. Spirit (2.x at least) does too much operator overloading, there's too much magic. It is not for me. I switched to using Tidy to parse posts and templates. However, I don't count that setback against C++11, since one of the Spirit authors is also unhappy with the complexity of v2.x.

Performance: Fine, especially under heavy load. It parses the GET and acquires a shared lock, and under heavy load that's all it does. It also rereads files when inotify() tells it to and eventually regenerates responses, but under heavy load that doesn't even show up on the profiling graphs. (I am not able to benchmark it properly, for lack of fast-enough network and clients.)

The code I wrote is available from my github account. Have a look. Be tolerant. You can tell I was slightly ill, and interested more in trying things than in producing reliable software.

Some notes: I did not implement RSS, because Atom exists and does the same job better. I did not implement any comment plugin, because of my opinions about comment spam, moderation and wham-bam comments. I chose to use HTTP/TCP instead of CGI, because that's the modern way and, yes, because I wanted to play with inotify(). Finally, I used the DOM for templating instead of cloning loathsxome's line-based design. For example, the home page template is an HTML file. Plusxome looks for <div id=foo> and inserts the most recent postings into that node.

Integer variables in Modula-3

Modula 3 is perhaps my favourite language. It has (had — it's practically dead now) most of what I like about java, most of what I like about c++, most of what I like about modula 2, and some unique features.

One of its little greatnesses is in the integer type system.

In modula 3, an unsigned integer is 31 or 63 bits long (as I recall, there are two unsigned integer types, although a tutorial I found on the web now mentions only one). Signed integers are 32 or 64 bits, so if a is a signed integer, a=b always works, regardless of whether b is signed or unsigned.

a=b does not throw exceptions, a=b; b=a does not change the value of b, and the cost of that is merely that you have to start using 64-bit variables at 2147483648 instead of at 4294967296.

Update: I want to expand on that, and compare it to java. The language designers of both java and modula3 understood that confusion or sloppiness with regard to signed and unsigned integers was a significant source of bugs in c/c++. Java solved it by not having unsigned integers, modula3 solved it by reducing their bit width by 1.5-3%.

I have seen many java programs that either output long to formats or protocols where only unsigned numbers are legal, or that read into long when the format clearly says 64 bits unsigned, so I think the java solution isn't very good. They chopped off the problematic feature and instead people use a misfit type. Sometimes it works: In this example it likely works because the sender too, uses a java long, so the so-called 64 bits unsigned ID is really 63-bit. I am not sure whether this kind of bug is preferable to the kind of signed/unsigned bugs in classic c.

Modula3, on the other hand, made the less obvious choice of leaving one bit at zero. The CPU registers are 32 or 64 bits wide, modula3 restrains programs to using 31 or 63 bits. As a result, programmers can still express the unsigned nature of many numbers using the type system, without sign problems. Subtle, well-considered, 97% good.

Fault tolerant programs and programmers

Archiveopteryx git head crashes a bit. Not every day, but some people reports that it crashes every week or month, at random times. Clearly there is a bug. Abhijit and I have discussed it and found a way to contain it, and I've written the code.

But I haven't found a way to push the fix to the master tree. I seem unable to commit and push that code. My soul wants to find the bug and fix it, not contain it.

Meanwhile, I had an appointment with the dentist this morning.

In the waiting room I read a fascinating blog post about a Chromium exploit. Sergey Glazunov, clearly an admirably clever hacker, stitched together fourteen bugs, quirks and missed hardening opportunities to form a critical exploit. The bugtracking information for one of the bugs shows that it was reported, discussed for a few days, then it was idle until Sergey leveraged it, and then it was fixed.

Chromium is a nice browser, and I appreciate the hardening and exploit resistance the team has added. I particularly appreciate the team's honesty: They run their pwnium contests and are frank about the results.

But now I am even less happy about making fault tolerant code. I feel that it may be mentally difficult to make a program tolerate faults and at the same time make a programmer not tolerate faults.