is a sorting technique that mimics the old card sorters that worked column by column sending punched cards into
different bins. I first heard of the idea from the late Tom Meikle who was a novice programmer when he independently
discovered it back in 1968 when we implemented it for the venerable LGP-30. He called it "DigitSort".
Only many years later after teaching it to scores of people did I learn it was already well known by the name RadixSort.
 |
recommend book⇒The Art of Computer Programming |
| | hardcover |
|---|
| ISBN13: | 978-0-201-48541-7 |
|---|
| ISBN10: | 0-201-48541-9 |
|---|
| publisher: | Addison-Wesley |
| published: | 1998-10-15 |
| by: | Donald Knuth |
| Knuth’s volumes 1, 2 and 3 are the reference works for standard algorithms. At his website he describes plans for volumes 4 and 5. |
|
The radix-distribution sort (which I call here simply RadixSort) is discussed at length in Knuth’s Art of Computer
Programming vol 3 #5.2.5, where he says that it was first published in 1929 in connection with
tabulating machines, although it was probably known to the operators before that, and first described in the form
included here in 1954. There is also a related radix-exchange sort.
Advantages
In Java, RadixSort is faster than either QuickSort or HeapSort in the benchmarks I ran. For small numbers of items,
almost any sorting algorithm will work just fine. There is rarely any reason to use anything but the built-in tournament
sort, unless the sort is too big to fit in RAM, where you need something like opt-sort.
RadixSort is stable, meaning it preserves existing order of equal keys. It works in linear time, unlike most other sorts.
In other words, it does not bog down when you have large numbers of items to sort. The time per item to sort is constant.
With other sorts, the time to sort per time increases with the number of items. Mathematicians would put it that most
sorts run in O(n log(n)) or O(n²) time, where RadixSort runs in O(n)
time.
Drawbacks
- RadixSort does not work well when you have very long keys, because the total sorting time is proportional to key length
and to the number of items to sort. If you insisted that all records have unique keys, necessarily the keys would have
to be at least log2(n) bits long, which would tend to defeat the advantage of
RadixSort. RadixSort shines when you have large numbers of records to sort with short keys.
- Unfortunately, in this particular implementation, 16 bit characters count as two key slots, unless you can guarantee you
never user the high order parts. In other words Unicode takes twice as long to sort as byte arrays
- The biggest drawback with RadixSort is that you have to write an unconventional compare routine.
- The other drawback is RadixSort also uses somewhat more working storage than a traditional sort, because it copies
references to the elements to sort back and forth between two arrays for each pass. Many other sorts make do with one
array and some stack space.
Tuning
If you want a high speed version of RadixSort for a particular application, you will have to inline the comparison
delegate code since delegate.getKeyByteAt, is where it chews up all the CPU time.
Since all sorts can use the same Comparator interface, it is possible to experiment with
various sorts to figure out which one works best for your situation.
Variants
There are other ways to implement the same basic algorithm by chaining the objects to be sorted together, but that tends
to be inefficient in Java since you need separate glue objects to do the linking unless the items to be sorted
themselves implement some chaining interface.