San Diego State University

Hester, James and Hirschberg, Daniel, "Self-Organizing Linear Search",

Search for x in a list of n data items

* Sort the list of data items (or create a binary search tree)

- Cost for general list is Theta(nlg(n))

* Now search for x

- Average and worst case cost is Theta(lg(n))

* We will perform more than one search

- Number of searches should be Omega(lg(n))

* All items in the list will be searched for with nearly the same frequency

Assume we have a list of n items:

The a's are ordered by frequency of access

Probability of accessing item : P( ) =

Probability of looking for an item not in the list is

Average cost

- U = set of all possible events
- P(e) = probability of event e
- C(e) = cost of event e

We have:

What is ?

We have:

So

Thus Ave Cost =

* Simple to code

location = -1 for (K = 0; K < n; K++) if ( data[K].key == X) { location = K; break; }

* Requires minimal space

Assume we have a list of n items:

Probability of accessing item

* Optimal Static Ordering

* Move-to-front

* Transpose

Assume P(ak) is known for all k in advance

Order items in decreasing probability

Example: 2.0 average comparisons

- Let P(a) = .2
- P(b) = .4 P(c) = .3 P(d) = .1
- Optimal static ordering
- b, c, a, d

Start with any initial ordering

When item is accessed move it to the front of the list

Example: 2.2 average comparisons

a, b, c, d start order b, a, c, d accessed b 2 comparisons b, a, c, d accessed b 1 a, b, c, d accessed a 2 c, a, b, d accessed c 3 a, c, b, d accessed a 2 c, a, b, d accessed c 2 c, a, b, d accessed c 1 d, c, a, b accessed d 4 b, d, c, a accessed b 4 b, d, c, a accessed b 1

Start with any initial ordering

When item is accessed move it forward one location

Example: 2.3 average comparisons

a, b, c, d start order b, a, c, d accessed b 2 comparisons b, a, c, d accessed b 1 a, b, c, d accessed a 2 a, c, b, d accessed c 3 a, c, b, d accessed a 1 c, a, b, d accessed c 2 c, a, b, d accessed c 1 c, a, d, b accessed d 4 c, a, b, d accessed b 4 c, b, a, d accessed b 3

Permutation algorithms

- Algorithm used to rearrange list after accessing a record

Restrictions

- Only consider permutation algorithms that move accessed item forward in the list
- Will not search for items not in the list
- All items will be searched at least once
- Time required by any execution of the permutation algorithm is never more than a constant times the time required for the search immediately before that execution.
- Example
- Given the list
- Accessing second item requires two comparisons so permutation algorithm can take c*2 time units
- Accessing the last item requires two comparisons so permutation algorithm can take c*n time units

the search sequence

[k] the item to be searched for on the k'th access

(, k) be the state of the list after the first k accesses from [[rho]]

(, k)r location in the list of item r after the first k accesses from [[rho]]

= (, 0) the initial configuration of the list

Cost of a permutation for a given l and r is the average cost per access in terms of the number of probes required to find the accessed record and the work required to permute the records afterwards

Asymptotic Cost

- Average cost over all and for a given
- Usually restrict to make analysis possible

Zipf noticed that in English the frequency of word usage follows:

where fi denotes the frequency of the i

Assume we have a list of n items:

The a's are ordered by frequency of access

Probability of accessing item is and

Then

But so we have

where

n = 2

k 1 2

Pk 0.6667 0.3333

n = 3

k 1 2 3

Pk 0.5455 0.2727 0.1818

n = 4

k 1 2 3 4

Pk 0.48 0.24 0.16 0.12

n = 5

k 1 2 3 4 5

Pk 0.438 0.219 0.146 0.109 0.0876

n = 6

k 1 2 3 4 5 6

Pk 0.408 0.204 0.136 0.102 0.0816 .0680

Let

If <= rand() < then return k

The number of papers in a given journal written by the same author follows an inverse square distribution.

Let n be the total number of authors who published at least one paper in a given journal.

The probability that a randomly chosen author contributed exactly k papers is given by:

n = 2 k 1 2 Pk 0.800 0.200 n =3 k 1 2 3 Pk 0.734 0.183 0.081 n =4 k 1 2 3 4 Pk 0.702 0.175 0.078 0.043 n =5 k 1 2 3 4 5 Pk 0.683 0.170 0.075 0.042 0.027 n =6 k 1 2 3 4 5 6 Pk 0.670 0.167 0.074 0.041 0.026 0.018

"80% of the transactions are on the most 20% of the records, and so on recursively"

When n = 5*L we have:

n = 2 k 1 2 Pk 0.908 0.100 n = 3 k 1 2 3 Pk 0.858 0.100 0.063 n = 4 k 1 2 3 4 Pk 0.825 0.100 0.063 0.047 n = 5 k 1 2 3 4 5 Pk 0.800 0.100 0.063 0.047 0.038

Knuth claims we can approximate this by:

n = 2 k 1 2 Pk 0.644 0.355 n = 3 k 1 2 3 Pk 0.515 0.283 0.200 n = 4 k 1 2 3 4 Pk 0.446 0.245 0.173 0.135 n = 5 k 1 2 3 4 5 Pk 0.401 0.220 0.155 0.121 0.100 n = 6 k 1 2 3 4 5 6 Pk 0.369 0.203 0.143 0.111 0.092 0.078

Steady State

- Further permutations are not expected to change the expected search time significantly

- Subsequences of [[rho]] may have relative frequencies of access that are drastically different from the overall relative frequencies

n Move-to-Front BST 7 112 170 15 360 490 31 1240 1290 63 4536 3210

Relative Measurements

- Optimal Static Ordering - items are ordered by static probability of access and are not moved

n Move-to-Front OSO 7 112 210 15 360 1050 31 1240 4650

Move-to-front

- Assume Zipf distribution

Transpose

Count

- approaches optimal static ordering

No Optimal Memoryless algorithm

Asymptotic Cost

- Move-to-front asymptotic cost at most twice asymptotic cost of the optimal static ordering
- Asymptotic cost of transpose is <= asymptotic cost of move-to-front
- Count is asymptotically equal to optimal static ordering

- Move-to-front and count at most twice the worst case of the optimal static ordering
- Transpose can be far worse
- Moving a record any fraction of the distance to the front of the list will be no more than a constant times the optimal off-line algorithm
- The constant is inversely proportional to the fraction of the total distance moved