Count Sort
Purpose
This sort keeps the values in a constant state of near readiness. It is valuable in cases where there is a long lead time and the values to be sorting are being provided with a long lead time.
This exists for fun. Please enjoy.
This is in some ways very similar to a Merge Sort the process of selecting what to merge has just been turned on its side. It is what it is. The complexity is order of N log N. By itself, well it's not especially useful by itself. Because of the overhead it is noticeable slower than just a regular sort. In my testing it was usually between 3 and 10 times slower than the regular sort.
Most of the sorting can be done at insertion, while the list is being created. Usually, our goal is to reduce the absolute amount of work to be done. In this case we decide to shift the work to when the numbers are being created or collected.
This concept is intended for cases where you have to be diligent, in a state of near always readiness.
This sort uses the behavior of a standard integer value to act as a tracker.
Each subsection just a sub-string of the vector that is sorted. When even two sub-strings of the same size are created, they will be merged together.
If I have sub-strings of the following lengths: 1, 4, 8, 32
And then add "1" to it I will get a new list with some combining
The new list will be: 2, 4, 8, 32.
If I add an item again, I will get sub-lists of 1,2,4,8,32.
When I add "1" again I will have to do some major work:1 + 1 will become 2. Then I will combine again 2 + 2 -> 4. Then I will combine again 4 + 4-> 8. Then I will combine again 8 + 8-> 16.
The result can be summarized as:
1 + 1, 2, 4, 8, 32 becomes 16, 32
While creating the library tracker for this algorithm it eventually became obvious that the binary arithmetic used for an "int" was perfect in capturing the desired behavior. As a result all of the tracking of what lists exist and also what lists need to be combined are handled by a single integer value "count".
So going back to my list examples 1, 4, 8, 32 is happily represented in binary as 0010 1101. When I add 0000 0001 to 0010 1101, we get 00101110.
00101110 translates to saying we should have lists 2,4,8,32 and this is as expected.
1 + 1, 2, 4, 8, 32 becoming 16, 32 is the same as: 0001 + 0010 1111 = 0011 0000. And tells us to go ahead and merge all the lower numbers into a new size 16 list.
I keep track of each list by its expected size in the corral array. The corral is just a bunch of pointers to keep track of the start of the sub-strings. This was a case where I could have made the corral a linked list and then only had the pointers that I needed, but instead just threw memory at the problem.
If I have a count of 1010 that will mean that corral[1] and corral[3] will have list starts in them.
Observed Results
This data was created by running may different runs on the computer. Note how the load time is where all the time is spent, But the load time is amortized as you are adding values.
Ratio of the final prep of the data as compared to a normal sort.
Note that their is a stair step effect.
Conclusion
So I hope this was helpful and maybe even a little fun for you to look over. It is short only 300 lines or so.
Please note that this is still at best "experimental"
Implementation
All the heavy lifting is actually done by the "inplace_merge" it just has a lot of overhead.
/*
This was written by Karl Kendall
This is an implementation of a experimental Icount sort
Copyright (C) 2021 Karl Kendall
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*/
#include <iostream>
#include <string>
#include <vector>
#include<algorithm>
using namespace std;
class Icount
{
private:
int corral[32];
std::vector<int> myVector;
int count;
public:
Icount()
{
count = 0;
}
void add(int guest)
{
int start, b, end;
b = myVector.size();
start = b;
count++;
// add the new record in.
myVector.push_back(guest);
end = myVector.size();
int i = 1;
int index = 0;
int result = count & i;
while((count & i) == 0)
{
start = corral[index];
inplace_merge (myVector.begin() + start ,myVector.begin() + b ,myVector.begin() + end);
b = start;
//corral[index] = -1;
i <<= 1;
index++;
}
//assert(corral[index] == -1); // we should be placing in a blank spot.
corral[index] = b;
}
void sort()
{
// clock_t load_start, load_end;
int i = 1;
int index = 0;
int end = myVector.size();// will be used to keep track of ends
int old = -1;
int c = count;
while(c)
{
if(i & c)
{
if (old == -1)
{
old = corral[index];
corral[index]= -1;
}
else
{
//load_start = clock();
inplace_merge (myVector.begin() + corral[index] ,myVector.begin() + old,myVector.begin() + end);
//load_end = clock();
//cout << "Time taken by in merge is : " << fixed
// << double(load_end - load_start) / double(CLOCKS_PER_SEC)
// << " sec " << endl;
old = corral[index];
corral[index] = -1;
}
c = c & ~i; //blanks out the spot in the count
}
i <<= 1;
index++;
}
if(count)
{
count = i >> 1;
corral[--index] = old;
}
}
vector<int>& give()
{
return myVector;
}
};
void i_count_sort(std::vector<int> & myVector)
{
std::vector<int>::iterator corral[32];
int count = 0;
std::vector<int>::iterator processed = myVector.begin();
std::vector<int>::iterator start;
std::vector<int>::iterator b;
std::vector<int>::iterator end = myVector.end();
int i = 1;
int index = 0;
// add the new record in.
while(processed != end)
{
//current "horse" is from b to processed
b = processed;
processed++;
count++;
i = 1;
index = 0;
while((count & i) == 0)
{
start = corral[index];
inplace_merge (start , b , processed);
b = start;
i <<= 1;
index++;
}
corral[index] = b;
}
i = 1;
index = 0;
std::vector<int>::iterator old = end;
int c = count;
while(c)
{
if(i & c)
{
if (old == end)
{
old = corral[index];
}
else
{
inplace_merge (corral[index] , old, end);
old = corral[index];
}
c = c & ~i; //blanks out the spot in the count
}
i <<= 1;
index++;
}
}
int main()
{
Icount icount;
const int SIZE = 3000000;
std::vector<int> myVector;
std::vector<int> normal_sort;
std::vector<int> count_sort;
std::vector<int> class_result;
cout << "starting:" << endl;
for(int i = 0; i < SIZE; i++)
{
int temp = i;
myVector.push_back(temp);
}
std::random_shuffle ( myVector.begin(), myVector.end() );
normal_sort = myVector;
count_sort = myVector;
clock_t load_start, load_end;
load_start = clock();
for(int i = 0; i < SIZE; i++)
{
icount.add( myVector[i]);
}
load_end = clock();
clock_t normal_start, normal_end;
clock_t class_start, class_end;
clock_t count_start, count_end;
count_start = clock();
i_count_sort ( count_sort);
count_end = clock();
normal_start = clock();
std::sort (normal_sort.begin(), normal_sort.end());
normal_end = clock();
class_start = clock();
icount.sort();
class_result = icount.give();
class_end = clock();
//cout << endl;
//for(int i = 0; i < count_sort.size(); i++)
//{
// cout << count_sort[i] << ' ' ;
//}
//cout << endl;
cout << "The vector size was: " << SIZE << endl;
cout << "Time taken by standalone count sort is : " << fixed
<< double(count_end - count_start) / double(CLOCKS_PER_SEC)
<< " sec " << endl;
cout << "Time taken by normal sort is : " << fixed
<< double(normal_end - normal_start) / double(CLOCKS_PER_SEC)
<< " sec " << endl;
cout << "Time taken by icount sort is : " << fixed
<< double(class_end - class_start) / double(CLOCKS_PER_SEC)
<< " sec " << endl;
cout << "Time taken by to load vectors is : " << fixed
<< double(load_end - load_start) / double(CLOCKS_PER_SEC)
<< " sec " << endl;
system("pause");
}
Last updated: 4/2/2021