QuickSort July 06, 2008

QuickSort. It’s widely accepted as the fastest generic sorting algorithm around. It’s also very simple. A comment in the last article showed me this Haskell beauty:

    qsort []     = []
    qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

While my C implementations aren’t any where as beautiful, I’m quite satisfied with their efficiencies. I’ll keep the code to a minimum, and the let the graphs do the talking. Quicksort is kinda like the heapsort, only not as strong (thus not as reliable). It is generally always recursive, and basically consists of 2 steps (Divide and Conquer):

Conquer: Randomly choose an element \(x\) in the sorting set \(X\). Separate the set into two segments consisting of those lesser than and greater than \(x\), \(X_1\) and \(X_2\). For example, \(X = [4,7,2,6,89,3,5,7,12]\) with \(x = 7\) gives us \(X_1 = [4,2,6,3,5]\) and \(X_2 = [7,89,12]\).
Divide: Do the same for \(X_1\) and \(X_2\) This will terminate when the size of \(X_1\) and \(X_2\) become less than 2. A reasonable implementation of this can save on space in comparison with the mergesort \((O(n))\), with \(O(1)\) overhead.

Interestingly, this algorithm does not guarantee \(O(n \log(n))\) time; it has a worst running time of \(O(n^2)\), no better than insertion sort. It just happens that the average running time is \(O(nlog(n))\), with much better coefficients than any other sort (actually, this is only true for comparison sorts, i.e. Heap, Merge, Binary, etc.).

A note, all these implementations have been run while running a normal gnome desktop, with just XChat and mocp running in the background. Let the graphs begin.

Naive Vanilla Implementation (using a scratch buffer):

// Key Lines
void sort (int len, int *list) 
{ 
  // ... 
  pivot = rand()%len; 
  // ... 
  if (list[i] < list[pivot])  
    buf[idx1++] = list[i]; 
  else 
    buf[idx2--] = list[i]; 
  //...
}

Interestingly, this is not half as jumpy as merge sort, considering that merge sort was tightly asympototically bound to \(n \log(n)\) (i.e. it’ll stay to that value as you go higher and higher), and this implementation uses random numbers to generate the pivot. The performance is much worse than MergeSort (which had a worst performance of about 0.9 ms)

Single Array Implementation

// Key Lines
void sort (int len, int *list) 
{ 
  // ... 
  SWAP (list[pivot], list[len-1]); 
  // ... 
  if (list[i] < list[len -1]) { 
    SWAP (list[idx], list[i]); 
    idx++; 
  } 
  // ... 
  SWAP (list[idx], list[len-1]);
  // ...
}

Much better, though still a bit off the 0.9 of MergeSort. The method swaps the pivot element to the end of the list, and keeps track of the position of the last number smaller than the pivot (or actually, the position after it). And at the end of the algorithm, swaps the pivot into that location.

Median Pivoting Implemenation

pivot = (len >> 1);

Aha, just better than MergeSort. And all that just by replacing the rather heavy rand() function with a simple median (just take the middle element. len>>1 is equivalent to len/2. I don’t know why I used it here, compiler optimazations and all, perhaps just used to it after the nearest_pow excercise). I would never put something like that in proper code). Interesting to note the optimization difference. (Thanks to Ivic for pointing out an error: Earlier I had written len>>2)

Grouping Equal Elements Implemenation

Whoa! Huge improvements. This was with grouping the terms that were equal to the pivot, and excluding them from the Divide step. Notably, this makes much less difference on a random sort.

Insertion Sort Implementation

A minor incremental improvement. It just grazes 0.8 ms. It seems that QuickSort is more amenable to the insertion sort optimization than MergeSort.

Another set of implementations can be found at Jeffery Stedfast’s place. One thing he tried out was to ‘unrecurse’ it by using a stack containing the extents of any stop (essentially, where the sub array you are sorting in one of the Divide steps begins and ends). I don’t see how this is different from what C is anyway doing, i.e. pushing the variables of a function call to a register, and then popping them later. There would be only the function call step (which is a jmp command, something that takes about 1-2 processor operations. On my machine (3.04Ghz) that would be about 3e-10 seconds). On trying to implement it, I found my implementation was probably very sucky, and was much slower than the un-modified program. However, for what it’s worth the previous implementations (using a median) are all quite a bit faster that Stedfasts implementation (~1.04s on an array of size 5e6). According to his post, his function is a bit faster than the glib qsort() function, and that also means mine is too :-).

Well, that’s all I have.