## CS 662 Theory of Parallel Algorithms Sample Sort

[To Lecture Notes Index]
San Diego State University -- This page last updated February 20, 1996, 1996

### Contents of Sample Sort Lecture

Let A[1..n] be array of items,
Each item has d digits
Simple Version
```for k = 1 to d do
Sort A on digit k using a stable sort

```

A sort is stable if two equal items retain their relative positions

Less Simple Version

Assume items have b bits
```for k = 1 to b by r do
Sort A on bits k, k + 1, ..., k + r -1 using a stable sort
```
Stable Sort

Index[ 0 .. - 1] is an array of integers

Seq. Counting-Rank( r, A )
```for k = 0 to  - 1
Index[ k ] = 0

for k = 1 to n do
Index[  ] = Index[ ] + 1

for k = 0 to  - 1 do
Index[ k ] = Index[ k -1 ] + Index[ k ]

for k = n to 1 do
B[ Index[ ] = A[ k ]
Index[  ] = Index[  ] - 1

Where  = bits r, r+1, ..., r +  -1 of A[k]

Time Complexity
```
2* + 2*n
for Seq. Counting-Rank
b/r *[2* + 2*n] = O( n) for Radix sort
Stable Sort - Parallelized

Each Processor gets n/p elements

Processors elements are stored in local array

Each processor has local array Index[ 0 .. - 1] of integers
Par. Counting-Rank( r, A )
```Each processor does in parallel:
for k = 0 to  - 1
Index[ k ] = 0

for k = 1 to n/p do
Index[  ] = Index[ ] + 1

offset = 0

for k = 0 to  - 1 do
count = Sum( Index[ k ] )
Index[ k ] =  Scan ( Index[ k ] ) + offset
offset = offset + count

for k = n/p to 1 by -1 do
B[ Index[ ] =
Index[  ] = Index[  ] - 1

Time Complexity
```
+ n/p + *lg(p) + n/p

Less Simple Version

```for k = 1 to b by r do
Sort A on bits k, k + 1, ..., k + r -1 using Par. Counting-Rank

Time Complexity:

```
b/r * [ + 2n/p + *lg(p)]

If items fit in one word than b and r are constants, so get
C*n/p + D*lg(p), where C and D are constants

Sample Sort

n keys to sort

P processors

Each processor starts with n/P keys

Algorithm assumes keys are all distinct

If keys are not distinct, tag each key with its address

So
```	1	2	1	3	1	4

```
becomes
```	1, 1	2, 2	1, 3	3, 4	1, 5	4, 6

Now (a, b) < ( c, d ) if  a < c or if (a = c and  b < d)
```

Basic Idea

1 Pick P - 1 splitter keys that partition keys into P buckets

2) Send each key to proper bucket, each processor acts is a bucket

3) Keys are sorted in each bucket
Step 1 Splitters

Each processor randomly selects s ( = 32 or 64) tagged keys

All tagged keys are sorted via Radix Sort

Select tagged keys with rank s, 2s, 3s, ... , (P - 1)s to be splitters

Time Complexity:
s for selecting s tagged keys
O( n/P + lg(P) ) for sort

Note: the splitters will not partition element evenly

Some buckets will get more elements than others

Let
L= size of the biggest bucket
> 1

We have:

What does Mean?
 n s 10,000 3 16 2.33E-01 100,000 3 16 2.33E+00 1,000,000 3 16 2.33E+01 10,000 3 32 5.43E-06 100,000 3 32 5.43E-05 1,000,000 3 32 5.43E-04 10,000 3 64 2.95E-15 100,000 3 64 2.95E-14 1,000,000 3 64 2.95E-13 10,000 3 128 8.71E-34 100,000 3 128 8.71E-33 1,000,000 3 128 8.71E-32 1,000,000,000,000 3 128 8.71E-27

Step 2 Send to Buckets

Node one broadcasts all splitters to all nodes

Each processor does binary search on splitters to determine where the proper bucket for each key

Send each key to its bucket

Time Complexity:
n/P * lg ( P ) for binary search for all keys
n/ P to send keys to bucket
Step 3 Sort buckets

Use radix sort to sort buckets

Time Complexity:
O( n/P )

Sample Sort Time Complexity
 Term Source O( n/P + lg(P) ) (step 1 ) + P + n/P * lg (P ) (step 2 ) + O( n/P ) ( step 3 )

So we get O( n/P * lg( P ) + P )