Arnt Gulbrandsen
About meAbout this blog

Building a balanced binary tree from sorted input in O(n)

Writing about Knuth's literate programming book reminded me about when I met him (at a conference) and asked why the following algorithm wasn't in TAOCP. He grasped the algorithm from a ten-second description, and said it wasn't there because he didn't know about it. Good reason.

My TA at university (when I invented it for an exercise) wasn't aware of it either, but unlike Knuth, my TA didn't understand it. And I hadn't commented the code at all. Sigh.

Here are some words in sorted order:

DecimalBinaryWord
100001name
200010next
300011non
400100of
500101on
600110one
700111other
801000perform
901001piece
1001010play
1101011reached
1201100row
1301101same
1401110scrambled
1501111so
1610000software
1710001still
1810010successful
1910011that
2010100the
2110101this
2210110title

You may observe that the boldfaced digits look like a binary tree. This property can be used to build a binary tree in O(n) given sorted input. Making an empty tree and repeatedly calling its insert operation is already quite fast, probably O(n log2(n)), so you may or may not care about the improvement. If the input happens to be in sorted order, lookups on a regular binary tree will be slow, but a tree built by this algorithm behaves well.

The algorithm:

  1. Make a zero-filled array a of node pointers, as big as the tree can be deep. log2(sizeof(void*)) is fine.
  2. For each item of input:
    1. Make a node with that input. Set its left and right children to null.
    2. Compute the node's proper level, l, by counting the number of zero bits below the least significant one bit.
    3. If l>0, and a[l-1] is filled, make the left pointer of your new node point to a[l-1] and clear a[l-1].
    4. If a[l+1] is filled, make its right pointer point to your new node.
    5. Set a[l] to the new node.
  3. Make a new variable, n, a pointer to a node. Set it to null.
  4. For each node in a, a[l], starting with a[0]:
    1. If a[l] is non-null, set a[l]'s right pointer to n, and set n to a[l].

At the end, n is the root of the tree.

A slight variation of the algorithm can be used to make sure the root has an almost equal number of left and right children.

Another variation can be used to rebalance an existing tree.