Sorting series post 7 focuses on non-comparison sorting: when key range or digit length is bounded, complexity can drop to O(n+k), but space, stability, and feasibility must be balanced.
Target Readers
- Engineers sorting integer keys with known range/digits.
- Learners seeking lower-than-n-log-n sorting for large batches.
- People comparing standard library comparison sorts vs non-comparison.
Background and Motivation
- Comparison sorts have a lower bound of Omega(n log n); non-comparison sorts bypass that by using range/digit information.
- Cost: extra space and restricted applicability; implementation must handle stability and memory carefully.
A - Algorithm
Covered algorithms: counting sort, bucket sort, radix sort (LSD).
Basic examples
- Counting:
[4, 2, 2, 8, 3], range 0..9, count -> prefix sums -> stable fill. - Radix: group by ones/tens/hundreds digits, stable per digit.
C - Concepts
| Algorithm | Idea | Time | Space | Stable |
|---|---|---|---|---|
| Counting | frequency + prefix sums | O(n+k) | O(k+n) | Can be stable |
| Bucket | split by ranges, sort within buckets | expected O(n+k) | O(n+k) | depends on inner sort |
| Radix | stable per-digit sort in multiple passes | O(d*(n+b)) | O(n+b) | Yes if each pass is stable |
- k: range size; d: number of digits; b: base (number of buckets).
- Stability: counting can be stable; radix requires stable passes; bucket depends on inner sort.
E - Engineering
Scenario 1: Small-range integer sort (Python counting)
def counting_sort(a, max_val):
cnt = [0]*(max_val+1)
for x in a: cnt[x]+=1
# prefix sums
for i in range(1, len(cnt)): cnt[i]+=cnt[i-1]
out=[0]*len(a)
for x in reversed(a):
cnt[x]-=1
out[cnt[x]] = x
return out
print(counting_sort([4,2,2,8,3], 9))
Scenario 2: Bucket sort for known float distribution (JavaScript)
Background: floats uniformly distributed in [0,1).
function bucketSort(arr, buckets=10){
const B=Array.from({length:buckets},()=>[]);
for(const x of arr){
const idx = Math.min(buckets-1, Math.floor(x*buckets));
B[idx].push(x);
}
for(const b of B) b.sort((a,b)=>a-b);
return B.flat();
}
console.log(bucketSort([0.78,0.17,0.39,0.26,0.72,0.94,0.21,0.12,0.23,0.68]));
Scenario 3: Radix sort for large integers (Go, LSD)
package main
import "fmt"
func radixLSD(a []int) {
maxv := 0
for _,v := range a { if v>maxv { maxv=v } }
exp := 1
buf := make([]int, len(a))
for maxv/exp > 0 {
cnt := make([]int, 10)
for _,v := range a { digit := (v/exp)%10; cnt[digit]++ }
for i:=1;i<10;i++ { cnt[i]+=cnt[i-1] }
for i:=len(a)-1;i>=0;i-- {
d := (a[i]/exp)%10
cnt[d]--
buf[cnt[d]] = a[i]
}
copy(a, buf)
exp *= 10
}
}
func main(){ a:=[]int{170,45,75,90,802,24,2,66}; radixLSD(a); fmt.Println(a) }
Scenario 4: C++ counting sort (small range)
#include <bits/stdc++.h>
using namespace std;
vector<int> counting_sort(const vector<int>& a, int maxv){
vector<int> cnt(maxv+1), out(a.size());
for(int x:a) cnt[x]++;
for(int i=1;i<=maxv;i++) cnt[i]+=cnt[i-1];
for(int i=(int)a.size()-1;i>=0;i--){
int x=a[i]; cnt[x]--; out[cnt[x]]=x;
}
return out;
}
Scenario 5: Rust radix sort (LSD)
pub fn radix_lsd(a: &mut [u32]) {
let mut maxv = *a.iter().max().unwrap();
let mut exp = 1u32;
let n = a.len();
let mut buf = vec![0u32; n];
while maxv/exp > 0 {
let mut cnt = [0usize; 10];
for &v in a.iter() { cnt[((v/exp)%10) as usize] += 1; }
for i in 1..10 { cnt[i] += cnt[i-1]; }
for &v in a.iter().rev() {
let d = ((v/exp)%10) as usize;
cnt[d] -= 1;
buf[cnt[d]] = v;
}
a.copy_from_slice(&buf);
exp *= 10;
}
}
R - Reflection
- Complexity and prerequisites:
- Counting: O(n+k), k is range size; if k » n, not worth it.
- Bucket: expected O(n+k) depends on distribution; worst-case can degrade.
- Radix: O(d*(n+b)), d digits, base b; each pass must be stable.
- Trade-offs:
- Memory: counting/bucket needs O(k) or O(n+k); large ranges are infeasible.
- Stability: counting and radix can be stable; bucket depends on inner sort.
- Data type: best for integers or keys mappable to integers (dates, IPs, fixed-length strings).
- Why it works:
- With bounded range/digits, non-comparison sorts break the n log n lower bound.
- Great for logs, bucketed stats, and bulk integer sorting.
S - Summary
- Non-comparison sorts require known range/digits/distribution and can reach O(n+k).
- Counting is simple and stable for small ranges; radix works for multi-digit integers; bucket depends on distribution.
- Core risks: memory blow-up, distribution mismatch, missing stability.
- Selection: small range -> counting; moderate digits + stable -> radix; uniform floats -> bucket; otherwise compare-based.
Practice Guide / Steps
- Estimate range/digits: if k is close to or larger than n, be cautious.
- Decide stability: radix needs stable per-digit sort; bucket may need stable inner sort.
- Control memory: counting array length = max-min+1; radix buffers at least O(n).
- Test with random, all equal, huge range, skewed distribution.
Common Pitfalls and Notes
- Counting sort with negatives needs offset (or split positive/negative).
- Radix loses stability if any digit pass is unstable.
- Bucket sort degrades on skewed distributions; increase bucket count or sort large buckets with other methods.
- Large memory usage calls for fallback to comparison sort or chunking.
Runnable Example: Counting with negatives (Python)
def counting_sort_with_neg(a):
mn, mx = min(a), max(a)
offset = -mn
cnt = [0]*(mx - mn + 1)
for x in a: cnt[x+offset]+=1
for i in range(1,len(cnt)): cnt[i]+=cnt[i-1]
out=[0]*len(a)
for x in reversed(a):
cnt[x+offset]-=1
out[cnt[x+offset]] = x
return out
print(counting_sort_with_neg([3,-1,2,-1,0]))
References and Further Reading
- CLRS non-comparison sorting chapters
- Donald Knuth, “The Art of Computer Programming, Vol. 3” (Sorting and Searching)
- Lower bounds and word-RAM model discussions on integer sorting
Meta
- Reading time: approx. 15 min
- SEO keywords: counting sort, bucket sort, radix sort, non-comparison, O(n+k)
- Meta description: sorting series (7) explaining prerequisites, complexity, and engineering trade-offs for counting/bucket/radix with multilingual examples.
Call to Action (CTA)
- Estimate range/digits of your dataset and implement a counting or radix sort benchmark.
- If distribution is skewed, tune bucket counts or use a hybrid inside large buckets.