CS 662: Sample Sort

CS 662 Theory of Parallel Algorithms
Sample Sort

[To Lecture Notes Index]
San Diego State University -- This page last updated February 20, 1996, 1996

Contents of Sample Sort Lecture

Sequential Radix Sort

Let A[1..n] be array of items,
Each item has d digits
Simple Version

for k = 1 to d do
	Sort A on digit k using a stable sort

A sort is stable if two equal items retain their relative positions

Less Simple Version
Assume items have b bits

for k = 1 to b by r do
	Sort A on bits k, k + 1, ..., k + r -1 using a stable sort

Stable Sort
Index[ 0 ..

- 1] is an array of integers

Seq. Counting-Rank( r, A )

for k = 0 to  - 1
	Index[ k ] = 0

for k = 1 to n do
	Index[  ] = Index[ ] + 1

for k = 0 to  - 1 do
	Index[ k ] = Index[ k -1 ] + Index[ k ]

for k = n to 1 do 
	B[ Index[ ] = A[ k ]
	Index[  ] = Index[  ] - 1


Where  = bits r, r+1, ..., r +  -1 of A[k]

Time Complexity

2* + 2*n: for Seq. Counting-Rank

b/r *[2* + 2*n] = O( n) for Radix sort

Stable Sort - Parallelized
Each Processor gets n/p elements

Processors elements are stored in local array

Each processor has local array Index[ 0 ..

- 1] of integers
Par. Counting-Rank( r, A )

Each processor does in parallel:
	for k = 0 to  - 1
		Index[ k ] = 0

	for k = 1 to n/p do
		Index[  ] = Index[ ] + 1

	offset = 0

	for k = 0 to  - 1 do
		count = Sum( Index[ k ] )
		Index[ k ] =  Scan ( Index[ k ] ) + offset
		offset = offset + count

	for k = n/p to 1 by -1 do 
		B[ Index[ ] = 
		Index[  ] = Index[  ] - 1

Time Complexity

+ n/p + *lg(p) + n/p

Parallel Radix Sort Less Simple Version

for k = 1 to b by r do
	Sort A on bits k, k + 1, ..., k + r -1 using Par. Counting-Rank


Time Complexity:

b/r * [ + 2n/p + *lg(p)]

If items fit in one word than b and r are constants, so get

C*n/p + D*lg(p), where C and D are constants

Sample Sort
n keys to sort

P processors

Each processor starts with n/P keys

Algorithm assumes keys are all distinct

If keys are not distinct, tag each key with its address

So

	1	2	1	3	1	4

becomes

	1, 1	2, 2	1, 3	3, 4	1, 5	4, 6

Now (a, b) < ( c, d ) if  a < c or if (a = c and  b < d)

Basic Idea
1 Pick P - 1 splitter keys that partition keys into P buckets

2) Send each key to proper bucket, each processor acts is a bucket

3) Keys are sorted in each bucket
Step 1 Splitters
Each processor randomly selects s ( = 32 or 64) tagged keys

All tagged keys are sorted via Radix Sort

Select tagged keys with rank s, 2s, 3s, ... , (P - 1)s to be splitters

Time Complexity:

s for selecting s tagged keys

O( n/P + lg(P) ) for sort

Note: the splitters will not partition element evenly

Some buckets will get more elements than others

Let

L= size of the biggest bucket

> 1

We have:

What does Mean?

n		s
10,000	3	16	2.33E-01
100,000	3	16	2.33E+00
1,000,000	3	16	2.33E+01
10,000	3	32	5.43E-06
100,000	3	32	5.43E-05
1,000,000	3	32	5.43E-04
10,000	3	64	2.95E-15
100,000	3	64	2.95E-14
1,000,000	3	64	2.95E-13
10,000	3	128	8.71E-34
100,000	3	128	8.71E-33
1,000,000	3	128	8.71E-32
1,000,000,000,000	3	128	8.71E-27

Step 2 Send to Buckets
Node one reads each splitter

Node one broadcasts all splitters to all nodes

Each processor does binary search on splitters to determine where the proper bucket for each key

Send each key to its bucket

Time Complexity:

P for reading all node

lg( P ) for broadcasting

n/P * lg ( P ) for binary search for all keys

n/ P to send keys to bucket

Step 3 Sort buckets
Use radix sort to sort buckets

Time Complexity:

O( n/P )

Sample Sort Time Complexity

Term	Source
O( n/P + lg(P) )	(step 1 )
+ P + n/P * lg (P )	(step 2 )
+ O( n/P )	( step 3 )

So we get O( n/P * lg( P ) + P )

CS 662 Theory of Parallel Algorithms Sample Sort

Contents of Sample Sort Lecture

Sequential Radix Sort

CS 662 Theory of Parallel Algorithms
Sample Sort