Building a balanced binary tree from sorted input in O(n)
Writing about Knuth's literate programming book reminded me about when I met him (at a conference) and asked why the following algorithm wasn't in TAOCP. He grasped the algorithm from a ten-second description, and said it wasn't there because he didn't know about it. Good reason.
My TA at university (when I invented it for an exercise) wasn't aware of it either, but unlike Knuth, my TA didn't understand it. And I hadn't commented the code at all. Sigh.
Here are some words in sorted order:
Decimal | Binary | Word |
---|---|---|
1 | 00001 | name |
2 | 00010 | next |
3 | 00011 | non |
4 | 00100 | of |
5 | 00101 | on |
6 | 00110 | one |
7 | 00111 | other |
8 | 01000 | perform |
9 | 01001 | piece |
10 | 01010 | play |
11 | 01011 | reached |
12 | 01100 | row |
13 | 01101 | same |
14 | 01110 | scrambled |
15 | 01111 | so |
16 | 10000 | software |
17 | 10001 | still |
18 | 10010 | successful |
19 | 10011 | that |
20 | 10100 | the |
21 | 10101 | this |
22 | 10110 | title |
You may observe that the boldfaced digits look like a binary tree. This property can be used to build a binary tree in O(n) given sorted input. Making an empty tree and repeatedly calling its insert operation is already quite fast, probably O(n log2(n)), so you may or may not care about the improvement. If the input happens to be in sorted order, lookups on a regular binary tree will be slow, but a tree built by this algorithm behaves well.
The algorithm:
- Make a zero-filled array a of node pointers, as big as the tree can be deep. log2(sizeof(void*)) is fine.
- For each item of input:
- Make a node with that input. Set its left and right children to null.
- Compute the node's proper level, l, by counting the number of zero bits below the least significant one bit.
- If l>0, and a[l-1] is filled, make the left pointer of your new node point to a[l-1] and clear a[l-1].
- If a[l+1] is filled, make its right pointer point to your new node.
- Set a[l] to the new node.
- Make a new variable, n, a pointer to a node. Set it to null.
- For each node in a, a[l], starting with a[0]:
- If a[l] is non-null, set a[l]'s right pointer to n, and set n to a[l].
At the end, n is the root of the tree.
A slight variation of the algorithm can be used to make sure the root has an almost equal number of left and right children.
Another variation can be used to rebalance an existing tree.