Radix sort for float numbers

Radix sort is a linear sorting algorithm. However, it is commonly applied to integral values. This article shows, that - under certain circumstances - radix sort can be applied to floating point values as well.

How radix sort works on integral values

To understand the method of radix sort, we first assume, that all our numbers are nonnegative and have an equal number of digits. Eventually, the numbers are filled with leading zeroes.

Consider the following sequence of numbers and how it can be written, so that all numbers have equally many digits.

(1,6,15,213,36,78, 21) Rightarrow (001, 006,015,213,036,078,21).png

Now, we define a simple operation on a sequence of numbers: s_k.png denotes a sorting operation, that sorts the numbers stable according to the k.png-th digit. Note that we begin counting the digits at the last digit. That is, the last digit, has index 0, the pre-last has index 1, and so on. For example, digit 2 of the number 19873 is 8.

Stable means the following: If an item a_i.png has the same k.png-th digit as a_j.png and a_i.png comes before a_j.png in the argument of s_k.png, then in the result, a_i.png comes before a_j.png.

As said, s_k.png sorts the sequence stable according to digit k.png. As an example, we consider the following:

s_0(001, 006, 015, 213, 036, 078, 021) = (001, 021, 213, 015, 006, 036, 078).png

As you can see, i write s_k(a,b,c).png instead of s_k((a,b,c)).png. It is easier to read.

Now, consider the following: If we first sort the numbers according to digit 0, then according to digit 1, then according to digit 2, and so on, we get a sorted sequence:

Let me illustrate this by the above example:

s_2(s_1(s_0(001, 006, 015, 213, 036, 078, 021))) = s_2(s_1(001, 021, 213, 015, 006, 036, 078)).png

s_2(s_1(001, 021, 213, 015, 006, 036, 078)) = s_2(001, 006, 213, 014, 036, 078).png

s_2(001, 006, 213, 014, 036, 078) = (001, 006, 014, 036, 078, 213).png

As you see, the final sequence is sorted. Of course, this is no proof, but it makes it a bit clearer that it works. The proof itself is left to the reader - it is not too complicated.

It is essential, that s_k.png is a stable sort.

If we now code our integer values into binary, we can easily radix-sort them binary digit by binary digit. That is, for each digit, we first look, how many of the numbers have k.png-th digit 0, and how many have k.png-th digit 1. Then, we distribute the numbers according to their k.png-th digit into bin 0 or bin 1.

How can radix sort be applied on floating point values?

Again, we assume, that we have no negative numbers.

32-bit-floating point values are stored the foolowing way:

The first bit denotes the sign, the second eight bits denote the mantissa, and the remaining bits stand for the decimal places. For more information, visit Wikipedia or this article.

The point is, that the binary representation of nonnegative floats and integers have a nice property. If you take this representation, you can treat the number, as if it was an integer.

That means, if you treat the digits like an integer, comparisons yield the same result as if you would treat them like floats.

For that reason, we are able to treat the floating point values as integers and then doing a radix sort as explained above.

This leads to the following C++-Code:


// Radix sort that treats floats as ints and sorts them
for (int i=0; i < 32; ++i){
	unsigned int radix=(1 << i);
	
	float zwarray[count];
	
	// count how many 0-value-radixes we have
	int count0=0;
	int count1=0;

	for (int j=0; j < count; ++j){
		if (!((*(int*)&(array[j]))&radix)){
			++count1;
		}
	}
	
	for (int j=0; j < count; ++j){
		if ((*(int*)&(array[j]))&radix){
			zwarray[count1]=array[j];
			++count1;
		}
		else {
			zwarray[count0]=array[j];
			++count0;
		}
	}
	
	for (int j=0; j < count; ++j){
		array[j]=zwarray[j];
	}
	
}

array is the array to be sorted. count is the number of elements in array.

Consider the expression ((*(int*)&(array[j]))&radix). It checks, wether the bit at position i is set or not.

Conclusion

The above implementation has some weak points. It's runtime is O(n).png where n.png is the number of elements to be sorted. More precisely, it traverses the array 64 times, since we assume 32 bit and we need 2 passes in each loop.

One optimisation could be, that we do not consider only one bit, but we consider a whole byte as a digit. If we do so, we only need 8 traversals, since we have 4 byte and again 2 passes in each loop. A (problably) solvable problem when using whole bytes, is that we have to do a more sophisticated counting.

Comments

Mahesh Bansal

2010-07-01 10:00:02

Thx Buddy !!

blogg.no

2010-06-15 05:45:45

realy good information

Gregory

2010-03-10 11:50:01

Steven fixed it for sign bit there: http://hbfs.wordpress.com/2010/03/09/radix-sort-on-floating-point-numbers/

generic

2010-03-04 09:38:35

Here those on! First time I hear!

Submit your Comment

To prove that you're human, you need to type the bold text into the textbox above: JQT