Heapsort
Heapsort is among the smarter algorithms, which are much faster than most of the ones we've covered. It's built on the Selection sort algorithm, so it's based on the extremes (in this case the greatest values), which are always moved to the end of the array. After integrating all of the greatest values to the end, we get a sorted array. The problem with Selection sort algorithm is that you have to manually search for the greatest/smallest values. In each nested loop, it has to go through the entire unsorted array part and check each item to see if it's the greatest value. There is no quicker way to find an array's greatest values.
However, what if there was a data structure in which we'd be able to represent items the same way as arrays, and also be able to find the greatest value in a constant amount of time without having to gothrough it all? With me being extremely specific and all, you probably guessed that this structure does indeed exist. The structure is called Heap (that's why the approach is called Heap Sort).
A heap is a binary tree with the following properties:
- All "levels" of the heap except for the last one are fully occupied by items (each inner node has exactly two children, so the tree is very balanced)
- The last level of the heap is filled from what's left (it can even be entirely filled)
- All items in a heap have a special property: both children are always smaller or equal to their parent.
Knowing this special property, we can deduce that the root will always contain the greatest value which is very handy for us.
A heap looks something like this:
Of course, we can define special properties of the heap the other way and have the smallest value appear in the root (this approach is used in some projects as well).
Since we usually sort an array and not a heap, we'll have to build the heap from the given array. As some of you may have guessed, we'll create the tree structure using pointers or references and add the items into it. We can represent the heap using an ordinary array (thanks to knowing it's balanced) using a simple trick. This way, we won't need an auxiliary structure/memory. We'll work directly in our array, which we'll see as a heap so that we can sort the items directly in place.
The heap in the picture above would look something like this (the item indexes are above, whereas, the items are below):
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
9 | 7 | 8 | 5 | 4 | 0 | 3 | 2 | 1 | 2 |
Notice that there're almost no similarities between the heapified array and the sorted one. Now let's try to work with this array as if it was a heap. You probably noticed that the items in the array are placed just like they would be in a heap - sorted from top to bottom and from left to right. If we wanted to access the root, we'd simply go to the 0th index. The last item of the heap is also on the index of the last item in the array. However, we'll need the access from any parent to it's children and vice-versa.
If a child has an index of i, we can calculate it's parent's index using (i - 1) / 2 (it's a whole-number division). If a parent has an index of i, its left child's index is the result of (2 * i + 1) and the right child's index is the result of (2 * i) + 2.
We can quickly check whether it worked as well. Let's try to find the child of item 7 which has an index of 1 (the array is indexed from 0 so it's the second item). (1 - 1) / 2 is 0. Its parent should be item 9, which proves to be correct. Its children would be (2 * 1 + 1) = 3 and the third index holds a five - which is indeed its left child. The right child has the same index incremented by one, which also proves to be correct.
Now, we have all the needed tools and we can go try the first step of the algorithm - heapifying the array.
Process
Heapifying the array (building the heap)
Consider an array of the following items:
3 | 2 | 5 | 1 | 9 | 4 |
The array represents the following "heap":
I used the quotes because this heap is broken - the special property of the heap doesn't apply here. Therefore, we've to fix it first and then heapify the array. To do so, we'll use the up function, which we'll continuously call on the items starting from the root. This function will find out whether the item breaks the special property of the heap (that the children aren't larger than their parent). If it finds one that is, it'll swap the item with its parent. However, doing so might just bring the problem up a level, which is why this function repeats until the special property of the heap applies for the item or until we reach the root. This function is called once for each item until it reaches the end. Then, we can be sure that the heap has been corrected.
Since I like to avoid the article to be too long, I'll only describe the heapifying progress in words - keep looking at the picture above.
We won't use the up function on the first item (the root), because it doesn't have a parent. Therefore, we'll start with item 2, which is smaller than its parent so we'll leave it in place. The next item is a 5 which is greater than its parent (3) so we'll swap them and end up at the root which means we're done. Item 1 is smaller than 2 which is correct. However, 9 is greater than 2 so we'd end up swapping them. However, 9 is still greater than 3 so we'd swap them again and 9 (the greatest value) would then be the root. 4 is greater than 3 so we'd swap them. 4 would then be smaller than 9, which is correct. The heapified array and the heap which it represents look like this:
9 | 5 | 4 | 1 | 2 | 3 |
Now, let's move on to actually getting it sorted.
Sorting
As I mentioned before, we'll literally perform a Selection sort to the heap. Meaning that we're going to swap the greatest value with the last item, and continue repeating this until we get to the end. The greatest value always lies at the 0th index, so we'd be able to reach it in a constant amount of time. Swapping also does not take any time at all.
However, we still have a bit of a problem - swapping the items will break the heap for sure. We don't care about the original greatest value (just like with Selection sort). However, the item which would then be the new root probably wouldn't be the greatest value. Therefore, we'll repair the heap using a similar function as was the up function - the down function. We'll run it starting from the root and if we find children that are greater than their parent, we'll swap them out. In case they're both greater, we'll swap the parent with its greater child. We could also run into the same problem while using the up function. Therefore, this way, we have to look again at the child (which is now the parent) and find out whether it's greater than its children. We repeat the process until the heap's special property is met or we reach the very end. We have to call this function each time we swap the root with the last item.
If we think about it, we'll figure out that neither tearing the greatest value nor swapping out values costs us any time. The only problem with this approach is the function down which has to be executed n times (for every maximum). On top of that, in each call, it has to be repeated at least log n times because the heap is a balanced binary tree and has a height of log n where the base of the logarithm is 2 because each parent has 2 children. The function checks each item in each level. In the worst scenario it would go through the entire level, which is log n. The up function has the same time complexity as the down function - O(n log n). Overall, we'd call the function up once and the function down n times, so the resulting complexity of Heapsort is O(n log n).
All in all, this algorithm showed a huge improvement in performance, especially, on bigger arrays. However, the algorithm isn't stable, so the more demanding amongst you might not be satisfied with it. In that case, you'll have to read the next article about Merge sort or Quicksort.
Algorithm properties
Time complexity | O (n log n) |
---|---|
Stability | No |
Speed | Very good |
Note. the time complexity is shown according to average cases and the speed is like this when compared to all of the other sorting algorithms.
Source code
// repair the heap up public static void up(int[] list, int i) { int child = i; // save the child int parent, temp; while (child != 0) { parent = (child - 1) / 2; // parent if (list[parent] < list[child]) { // error detection temp = list[parent]; // swap the child with the parent list[parent] = list[child]; list[child] = temp; child = parent; // new child } else return; // ok } } // repair the heap down public static void down(int[] list, int last) { int parent = 0; int child, temp; while (parent * 2 + 1 <= last) { child = parent * 2 + 1; // if the smaller child is chosen if ((child < last) && (list[child] < list[child + 1])) child++; // we pick the greater if (list[parent] < list[child]) { // error detection temp = list[parent]; // swap the child with the parent list[parent] = list[child]; list[child] = temp; parent = child; // new parent } else return; } } // build the heap from an array public static void heapify(int[] list) { for (int i = 1; i < list.length; i++) up(list, i); } // sorting public static void heapsort(int[] list) { heapify(list); int index = list.length - 1; // the last item int temp; while (index > 0) { temp = list[0]; // swap the last item with the maximum list[0] = list[index]; list[index] = temp; index -= 1; // set new last item down(list, index); } }
// repair the heap up public static void up(int[] list, int i) { int child = i; // save the child int parent, temp; while (child != 0) { parent = (child - 1) / 2; // the parent if (list[parent] < list[child]) { // error detection temp = list[parent]; // swap the child with the parent list[parent] = list[child]; list[child] = temp; child = parent; // new child } else return; // ok } } // repair heap down public static void down(int[] list, int last) { int parent = 0; int child, temp; while (parent * 2 + 1 <= last) { child = parent * 2 + 1; // if the smaller son is chosen if ((child < last) && (list[child] < list[child + 1])) child++; // pick the greater one if (list[parent] < list[child]) { // error detection temp = list[parent]; // swap the child with the parent list[parent] = list[child]; list[child] = temp; parent = child; // new parent } else return; } } // build the heap from the array public static void heapify(int[] list) { for (int i = 1; i < list.Length; i++) up(list, i); } // sorting public static void heapsort(int[] list) { heapify(list); int index = list.Length - 1; // the last item int temp; while (index > 0) { temp = list[0]; // swap the last item with the maximum list[0] = list[index]; list[index] = temp; index -= 1; // set new last item down(list, index); } }
// repair the heap up procedure up(var list: array of integer; i: integer); var parent, child, temp: integer; begin child:=i; // save the child while (child <> 0) do begin parent:=(child - 1) div 2; // the parent if (list[parent] < list[child]) then begin // error detection temp:=list[parent]; // swap the child with the parent list[parent]:=list[child]; list[child]:=temp; child:=parent; // new child end else exit; // OK end; end; // repair the heap down procedure down(var list: array of integer; last: integer); var parent, child, temp: integer; begin parent:=0; while (parent * 2 + 1 <= last) do begin child:=parent * 2 + 1; // if the smaller child was chosen if ((child < last) and (list[child] < list[child + 1])) then inc(child); // choose the greater if (list[parent] < list[child]) then begin // error detection temp:=list[parent]; // swap the child with the parent list[parent]:=list[child]; list[child]:=temp; parent:=child; // new parent end else exit; end; end; // build the heap from the array procedure heapify(var list: array of integer); var i: integer; begin for i:=1 to (length(list) - 1) do up(list, i); end; // sorting procedure heapsort(var list: array of integer); var temp, index: integer; begin heapify(list); index:=length(list) - 1; // the last item while (index > 0) do begin temp:=list[0]; // swap the last item with the maximum list[0]:=list[index]; list[index]:=temp; dec(index); // set new last item down(list, index); end; end;
# repair the heap up def up(list, i) child = i # save the child while (child != 0) parent = (child - 1) / 2 # the parent if (list[parent] < list[child]) # error detection temp = list[parent] # swap the child with the parent list[parent] = list[child] list[child] = temp child = parent # new child else return # OK end end end # repair the heap down def down(list, last) parent = 0 while (parent * 2 + 1 <= last) child = parent * 2 + 1 # if the smaller child was chosen if ((child < last) && (list[child] < list[child + 1])) child += 1 # choose the greater end if (list[parent] < list[child]) # error detection temp = list[parent] # swap the child with the parent list[parent] = list[child] list[child] = temp parent = child # new parent else return end end end # build the heap from the array def heapify(list) 1.upto(list.length - 1) { |i| up(list, i) } end # sorting def heapsort(list) heapify(list) index = list.length - 1 # the last item while (index > 0) temp = list[0] # swap the last item with the maximum list[0] = list[index] list[index] = temp index -= 1 # set new last item down(list, index) end end
Implementation of this algorithm is based on materials of Unicorn College from professor Ondřej Kučera.