Rewriting history

The commits I ranted about yesterday aren't very large compared to what I wrote. More than 90% of the code I wrote disappeared while I merged commits for review.

The biggest block of code that disappeared was a false start. I had an idea for how to solve a problem, wrote much code, and eventually saw that the program had become lopsided. The percentage of code dedicated to that one problem was far too large. Once I understood that I quickly found a much better approach. Smaller. More robust.

That detour will not be visible in the version history my colleagues can see. That history shows a different Arnt, one that went more or less straight to the right design.

Other commits that disappeared (or almost) include some that show how I foolishly trusted the specification at first and learned better later. Instead of a two-line change with a commit message saying workaround for client x, git blame shows a 200-line message saying new class x and a bit more. There's also one whose commit message elaborated on an oversimplified paragraph in the spec.

There are good arguments for tidying the commits. It simplifies review (unless something goes wrong during merging). It shows the origin of the working code with a minimum of distraction. It shows the kind of team we want to be: One whose code goes from working to working better and doing more, not one that spends weeks on overengineered detours. It shows, ahem, the kind of team management wants us to be. And, at least in theory, it's possible to have team-visible version history that always passes all tests (I like to commit broken code sometimes, the knowledge that I'll have to rewrite lets me do that freely).

These arguments do not persuade me. Maybe almost, not quite. Some of them are bad. I spit on management if management judges me by the way I rewrite my commit history. I spit on the CI tool if the CI tool does not understand that the sixteen commits I pushed in the same second form one unit from from its point of view.

I have some arguments against, and they seem better.

Rewriting obscures the origin of some code. Not necessarily, but it happens that way in practice. Merging eight versions of a file into one is easy, leaving a few changes in the middle as units with their own commit messages is not as easy. In yesterday's case, I had a change in the middle of many others to the same file, where I taught a parser to accept XML input that someone had mangled with an illegal BOM. Should we do that or should we complain to the generator? This is a question that could well be discussed separately during code review, and the patch to add it ought to be separate. Unfortunately reordering commits to keep it separate broke the build and I gave up. The code now looks like just another part of the class' initial commit.

Rewriting loses information from commit messages. I sometimes have two-line commits with three-paragraph commit messages, e.g. discussing how this slight change makes the code compatible with both version A and version B of tool or protocol C. If I were to preserve that entirely in a merged commit, my commit messages could be many pages long.

Rewriting loses history about mistakes, and those who do not learn from history will repeat mistakes. (Who am I quoting? I don't care to look that up now, sorry.) In this case, the code has an obvious refactoring opportunity, but anyone who tries to refactor would do well to see what happened to various code I threw away.

And finally, and I think this is my real issue with rewriting, I consider rewriting history philosophically unsound.

I suck. Often and con vivace. My colleagues aren't perfect either. Rewriting history is a way to bury mistakes, and philosophically I believe that we humans are better off accepting our fallibility. Particularly we programmers, who have such rich means to make software help us. Consider my tab/space rant and the -w option to git blame.