A lower approximation of the sorting problem complexity
Anyone who is at least slightly familiar with sorting algorithms has probably asked themselves if there's another algorithm that can go even faster than those they're familiar with. By faster, I mean in regards to the (asymptotic) time complexity. The problem's complexity (not the algorithm's complexity) refers to the task's complexity, i.e. the complexity of the best possible algorithm that will solve the task. In some cases, the algorithm that we need may be unknown to us, however, all that matters is that we'd eventually be able to implement it. Now, let's focus on the item sorting problem.
Every algorithm described here gave us an upper-bound approximation of the sorting problem complexity. Bubblesort had a time complexity of O(n2), we know that this complexity is the worst for sorting. For example, Heap sort was able to solve the same problem with a time complexity of O(n log n). Therefore, we do know that the worst case may be this one. However, could there be an algorithm with an even better time complexity? Let's assume that it couldn't and let's try to prove this statement. In other words, we'll prove that the lower-bound approximation of the sorting problem complexity is O(n log n), meaning items cannot be sorted in any better time.
Proof
When we sort based on comparing items (deciding which is greater or smaller), we can write the algorithm as a decision tree. A decision tree is a binary tree, which contains pairs of items that will be compared in its inner nodes and their permutations in its leaves. Meaning that each leaf contains a possible output for the algorithm, i.e. all the leaves are all the possibilities of how the items could be sorted in the array.
A decision tree for a 2 items array would be simple. It'd contain a single root, in which we'd ask: is a < b? If so, we'd get a permutation [a, b]. If not, we'd get [b, a] and the sorting would be done. However, as we add more items, the decision tree grows very quickly. A decision tree for 3 items would look like this:
For 4 items, the tree would be 4 times larger.
If we sort n items, there'd be n! of all the permutations, so the tree has to have at least n! leaves. Trees for the dumber algorithms would have many more leaves. As you can imagine, the best algorithm would have a tree with n! leaves. In the worst scenario (the algorithm taking the longest tree branch), it'd do as many comparisons as the number of levels the tree has, i.e. it'd be equal to the tree's height.
A binary tree of height h will have at most 2^h leaves.
As I mentioned above, there has to be at least n! leaves for the tree to be complete, otherwise, it wouldn't be able to sort some permutation for the input items. We'll start with the following inequality:
After applying the logarithm on both sides, we'd get:
Now, let's focus only on n! and see if we could change it a bit to fit our needs. As for the other steps, it'd be very nice to have the expression (n/2)^(n/2) instead of n!. We can set this up using the following trick:
To be a bit more illustrative, I've added the members of 8!. This way, the n! can be itemized as:
Since there was >= n! in our original inequality, we can reduce the n! without breaking the inequality. Let's take the left half from the expression above and throw away the right half.
We can even replace all the members with n/2 because every member is definitely >= n/2, so the inequality won't be broken.
The expression above can be written as (n/2)^(n/2) and can substitute the original inequality.
Transforming:
(The power went before the logarithm, the logarithm of a division is the difference of the logarithms, log 2 is 1, I omit this and incremented the denominator). The result is h >= n/4 log n.
This means that every algorithm must, in the best scenario, do comparisons (we hide the constant in the Omega symbol).
The lower approximation of the sorting problem complexity is , and a sorting algorithm based on mutual item comparisons with a better time complexity than this can not exist. Notice that I emphasized based on mutual item comparisons. Can we actually do it faster? This and more is covered in the following article - Counting sort