SDSU CS 662 Theory of Parallel Algorithms
Sample Sort

[To Lecture Notes Index]
San Diego State University -- This page last updated February 20, 1996, 1996
----------

Contents of Sample Sort Lecture


Sequential Radix Sort


Let A[1..n] be array of items,
Each item has d digits
Simple Version
for k = 1 to d do
	Sort A on digit k using a stable sort


A sort is stable if two equal items retain their relative positions


Less Simple Version

Assume items have b bits
for k = 1 to b by r do
	Sort A on bits k, k + 1, ..., k + r -1 using a stable sort
Stable Sort

Index[ 0 .. - 1] is an array of integers

Seq. Counting-Rank( r, A )
for k = 0 to  - 1
	Index[ k ] = 0

for k = 1 to n do
	Index[  ] = Index[ ] + 1

for k = 0 to  - 1 do
	Index[ k ] = Index[ k -1 ] + Index[ k ]

for k = n to 1 do 
	B[ Index[ ] = A[ k ]
	Index[  ] = Index[  ] - 1


Where  = bits r, r+1, ..., r +  -1 of A[k]

Time Complexity
2* + 2*n
for Seq. Counting-Rank
b/r *[2* + 2*n] = O( n) for Radix sort
Stable Sort - Parallelized

Each Processor gets n/p elements

Processors elements are stored in local array

Each processor has local array Index[ 0 .. - 1] of integers
Par. Counting-Rank( r, A )
Each processor does in parallel:
	for k = 0 to  - 1
		Index[ k ] = 0

	for k = 1 to n/p do
		Index[  ] = Index[ ] + 1

	offset = 0

	for k = 0 to  - 1 do
		count = Sum( Index[ k ] )
		Index[ k ] =  Scan ( Index[ k ] ) + offset
		offset = offset + count

	for k = n/p to 1 by -1 do 
		B[ Index[ ] = 
		Index[  ] = Index[  ] - 1

Time Complexity
+ n/p + *lg(p) + n/p

Parallel Radix Sort
Less Simple Version

for k = 1 to b by r do
	Sort A on bits k, k + 1, ..., k + r -1 using Par. Counting-Rank


Time Complexity:

b/r * [ + 2n/p + *lg(p)]


If items fit in one word than b and r are constants, so get
C*n/p + D*lg(p), where C and D are constants




Sample Sort

n keys to sort

P processors

Each processor starts with n/P keys

Algorithm assumes keys are all distinct

If keys are not distinct, tag each key with its address

So
	1	2	1	3	1	4

becomes
	1, 1	2, 2	1, 3	3, 4	1, 5	4, 6

Now (a, b) < ( c, d ) if  a < c or if (a = c and  b < d)


Basic Idea

1 Pick P - 1 splitter keys that partition keys into P buckets

2) Send each key to proper bucket, each processor acts is a bucket

3) Keys are sorted in each bucket
Step 1 Splitters

Each processor randomly selects s ( = 32 or 64) tagged keys

All tagged keys are sorted via Radix Sort

Select tagged keys with rank s, 2s, 3s, ... , (P - 1)s to be splitters


Time Complexity:
s for selecting s tagged keys
O( n/P + lg(P) ) for sort



Note: the splitters will not partition element evenly

Some buckets will get more elements than others

Let
L= size of the biggest bucket
> 1

We have:


What does Mean?
ns
10,0003162.33E-01
100,0003162.33E+00
1,000,0003162.33E+01
10,0003325.43E-06
100,0003325.43E-05
1,000,0003325.43E-04
10,0003642.95E-15
100,0003642.95E-14
1,000,0003642.95E-13
10,00031288.71E-34
100,00031288.71E-33
1,000,00031288.71E-32
1,000,000,000,00031288.71E-27

Step 2 Send to Buckets

Node one reads each splitter

Node one broadcasts all splitters to all nodes

Each processor does binary search on splitters to determine where the proper bucket for each key

Send each key to its bucket


Time Complexity:
P for reading all node
lg( P ) for broadcasting
n/P * lg ( P ) for binary search for all keys
n/ P to send keys to bucket
Step 3 Sort buckets

Use radix sort to sort buckets


Time Complexity:
O( n/P )

Sample Sort Time Complexity
TermSource
O( n/P + lg(P) )(step 1 )
+ P + n/P * lg (P )(step 2 )
+ O( n/P )( step 3 )

So we get O( n/P * lg( P ) + P )
----------