Arnt Gulbrandsen
About meAbout this blog

Rewriting history

The commits I ranted about yesterday aren't very large compared to what I wrote. More than 90% of the code I wrote disappeared while I merged commits for review.

The biggest block of code that disappeared was a false start. I had an idea for how to solve a problem, wrote much code, and eventually saw that the program had become lopsided. The percentage of code dedicated to that one problem was far too large. Once I understood that I quickly found a much better approach. Smaller. More robust.

That detour will not be visible in the version history my colleagues can see. That history shows a different Arnt, one that went more or less straight to the right design.

Other commits that disappeared (or almost) include some that show how I foolishly trusted the specification at first and learned better later. Instead of a two-line change with a commit message saying workaround for client x, git blame shows a 200-line message saying new class x and a bit more. There's also one whose commit message elaborated on an oversimplified paragraph in the spec.

There are good arguments for tidying the commits. It simplifies review (unless something goes wrong during merging). It shows the origin of the working code with a minimum of distraction. It shows the kind of team we want to be: One whose code goes from working to working better and doing more, not one that spends weeks on overengineered detours. It shows, ahem, the kind of team management wants us to be. And, at least in theory, it's possible to have team-visible version history that always passes all tests (I like to commit broken code sometimes, the knowledge that I'll have to rewrite lets me do that freely).

These arguments do not persuade me. Maybe almost, not quite. Some of them are bad. I spit on management if management judges me by the way I rewrite my commit history. I spit on the CI tool if the CI tool does not understand that the sixteen commits I pushed in the same second form one unit from from its point of view.

I have some arguments against, and they seem better.

Rewriting obscures the origin of some code. Not necessarily, but it happens that way in practice. Merging eight versions of a file into one is easy, leaving a few changes in the middle as units with their own commit messages is not as easy. In yesterday's case, I had a change in the middle of many others to the same file, where I taught a parser to accept XML input that someone had mangled with an illegal BOM. Should we do that or should we complain to the generator? This is a question that could well be discussed separately during code review, and the patch to add it ought to be separate. Unfortunately reordering commits to keep it separate broke the build and I gave up. The code now looks like just another part of the class' initial commit.

Rewriting loses information from commit messages. I sometimes have two-line commits with three-paragraph commit messages, e.g. discussing how this slight change makes the code compatible with both version A and version B of tool or protocol C. If I were to preserve that entirely in a merged commit, my commit messages could be many pages long.

Rewriting loses history about mistakes, and those who do not learn from history will repeat mistakes. (Who am I quoting? I don't care to look that up now, sorry.) In this case, the code has an obvious refactoring opportunity, but anyone who tries to refactor would do well to see what happened to various code I threw away.

And finally, and I think this is my real issue with rewriting, I consider rewriting history philosophically unsound.

I suck. Often and con vivace. My colleagues aren't perfect either. Rewriting history is a way to bury mistakes, and philosophically I believe that we humans are better off accepting our fallibility. Particularly we programmers, who have such rich means to make software help us. Consider my tab/space rant and the -w option to git blame.

Git, after all, not perforce

The earlier posts on git and perforce sound as if I'd rather use perforce than git. True in a way. Perforce rocks. It's practically bug-free, for a start — after more than a decade of use I can only remember one bug (using one variant of p4 obliterate made p4 diff report the wrong diff).

But rocking bug-freely isn't enough. […More…]

Features git… will never have?

What else I miss:

Change numbers: Perforce's mothership issues change numbers. Sequentially increasing, fairly small numbers. Six digits sometimes, but really fairly small, since only the last 3-4 digits tend to matter at any time.

I like the change numbers. […More…]

Features git will never have, part 2

When you sync in perforce, it updates the files you're not editing to the latest revision, and leaves the ones you're editing untouched. Git fetch, by contrast, updates everything that's changed.

Perforce's behaviour requires that client state cannot be expressed as a simple change number. Some files may be synced to change 5522, others to 5525. A git tree is always at a particular change, plus a local diff.

The cost of that simplicity is that git fetch ties p4 sync and p4 merge. If you want to pull unrelated changes into your tree and there are conflicts with your current work, then you have to deal with those conflicts immediately. p4 sync is low-risk, git fetch risks interrupting your train of thought with a merge.

Features git will never have, part 1

When you open a file for writing in perforce, it checks whether anyone else is working on the same file, and tells you who, if someone is.

This is a fine feature. It warns you if there was a misunderstanding and someone else is already doing what you're about to start doing, and it warns you if someone else has forgotten to commit a change. The price for that is that the perforce server has to know who has the file open for writing, and that when you see that message x is also editing y, you have to talk to x about whether your work conflicts.

Git has nothing like a perforce server, so no place to check for such approaching conflicts.