## CS660 Combinatorial Algorithms Fall Semester, 1996 B-Trees, (a,b)-Trees

[To Lecture Notes Index]
San Diego State University -- This page last updated Oct 10, 1996

## References

Introduction to Algorithms, Chapter 19

Data Structures and Algorithms 1: Sorting and Searching, Mehlhorn, sections 5.2, 5.3

`B-Trees, (a,b)-Trees Slide # 1`

## B-Trees of degree t

A tree T is a B-Trees of degree t if

a) All leaves of T have the same depth

b) All internal nodes of T except the root we have:
t<= c(v) <= 2t

c) The root of T satisfies 2 <= c(v) <= 2t

c(v) = number of children of node v

`B-Trees, (a,b)-Trees Slide # 2`
Other Definitions of B-Tree

All internal nodes of T except the root we have:
t <= c(v) <= 2t

All internal nodes of T except the root have between
t-1 and 2t-1 keys

All internal nodes of T except the root we have:
t+1 <= c(v) <= 2t + 1

All internal nodes of T except the root we have:
t/2 <= c(v) <= t

All internal nodes of T except the root we have:
<= c(v) <= t

`B-Trees, (a,b)-Trees Slide # 3`
Theorem. If n >= 1, then for any n-key B-tree T of height h and degree t >= 2 then

proof.
so

take log of both sides.
How many levels?
 t N # of Levels 256 33,000,000 4 256 8,550,000,000 5 128 4,100,000 4 128 530,000,000 5

`B-Trees, (a,b)-Trees Slide # 4`
Theorem. The worst case search time on a n-key B-tree T of degree t is O(lg(n)).
A node in T has t-1 <= K <= 2t-1 keys in sorted order.
Worst case:
K = t-1 for all nodes
searching for X not in the tree
Given a node, W, in T, how much work does it take to find the subtree of W that would contain X?
Using binary search it takes
= = comparisons
Since the height of the tree is in worst case the total amount of work is:

`B-Trees, (a,b)-Trees Slide # 5`

### Insertion in a B-Tree

Inserting X into B-tree T of degree t

A full node is one that contains 2t-1 keys

1. Find the leaf that should contain X

2. If the path from the root to the leaf contains a full node, then split the node when you first search it.
Example t = 2, Insert 25
Full Node is split, Then insert 25 into subtree b

3. Insert X into the proper leaf

`B-Trees, (a,b)-Trees Slide # 6`
Example t = 2, Insert 25

`B-Trees, (a,b)-Trees Slide # 7`

### Deletion in a B-Tree

Deleting X from B-tree T of degree t

A minimal node is one that contains t-1 keys and is not the root

In the search path from the root to node containing X, if you come across a minimal node add a key to it.

Case 3. Searching node W that does not contain X. Let c be the child of W that would contain X.

Case 3a. if c has t-1 keys and a sibling has t or more keys, steal a key from the sibling
Example t = 2, Delete 250

`B-Trees, (a,b)-Trees Slide # 8`
Case 3b. if c has t-1 keys and all siblings have t-1 keys, merge c with a sibling
Example 1. t = 2, Delete 250
Example 2. t = 2, Delete 250

`B-Trees, (a,b)-Trees Slide # 9`
Case 2. Internal node W contains X.

Case 2a. If the child y of W that precedes X in W has at least t keys, steal predecessor of W
Example 1. t = 2, Delete 50
Now Delete 45w

`B-Trees, (a,b)-Trees Slide # 10`
Case 2b. If the child z of W that succeed X in W has at least t keys, steal the successor of W
Example 1. t = 2, Delete 30
Now Delete 40w

`B-Trees, (a,b)-Trees Slide # 11`
Case 2c. If both children z and y of W that succeed (follow) X in W have only t-1 keys, merge z and y
Example t = 2, Delete 30
Now Delete 30w one lower level

Case 1. X is in node W a leaf. By case 3, W has at least t keys. Remove X from W

`B-Trees, (a,b)-Trees Slide # 12`

### B-Trees and Red-Black Trees

Theorem. A Red-Black tree is a B-Tree with degree 2

proof:

Must show:
1. If a node is red, then both its children are black
2. Every simple path from a node to a descendant leaf contains the same number of black nodes

`B-Trees, (a,b)-Trees Slide # 13`
Leaf-Oriented Storage

Data is stored in leaves. Internal nodes are used to index into leaves.
Node-Oriented Storage
Leaf-Oriented Storage

`B-Trees, (a,b)-Trees Slide # 14`

## ( a, b )-Trees

Will assume items of interest are stored in the leafs, but this is not required

Leaf contains one key

Internal nodes contain keys used to find leafs

Let a and b be integers with a >= 2 and 2a-1 <= b. A tree T is an (a, b)-tree if

a) All leaves of T have the same depth

b) All internal nodes v of T except the root satisfy a <= c(v) <= b

c) The root of T satisfies 2 <= c(v) <= b

c(v) = number of children of node v

`B-Trees, (a,b)-Trees Slide # 15`

### Insertion

(2, 4) Tree Insert 6
Find Proper Leaf Location and Add
If needed Split Node

`B-Trees, (a,b)-Trees Slide # 16`

### Deletion

Delete 6
Find and Delete leaf, Shrink Parent
If needed either fuse parent or

`B-Trees, (a,b)-Trees Slide # 17`
Share nodes from Sibling

Let T be an ( a, b )-tree with n leaves and height h. Then:

a)

b) lg (n)/lg (b) <= h <= 1 + lg( n/2 ) / log ( a )

`B-Trees, (a,b)-Trees Slide # 18`

### Amorized Cost

Theorem[1] Let b >= 2a and a >= 2. Perform any sequence of i insertions and d deletions( n = i + d ) into an initially empty ( a, b)-tree. Let
SP = total number of node splittings
F = total number of node fusings
SH = total number of node sharings

then:

SP + F + SH = O( n ).

This is not true when b = 2a - 1. That is for certain definitions of B-trees!

`B-Trees, (a,b)-Trees Slide # 19`
Values of a and b?

Assume b = 2a

Assume it costs C1 + C2m time units to move m contiguous elements from secondary to main memory

C1 = latency time, C2 = time to move one storage location

Assume it costs K1 + K2n to determine the subtree of interest in a node containing n keys

Total search time in (a , b ) -tree will be bound by
Time to search one node * number of levels
( K1 + K2a + C1 + C2a ) lg( n ) / lg( a )

This is minimal when
a* ln( a - 1 ) = ( K1 + C1 ) / ( K2 + C2 )

Tree In Main Memory

K1 ~ K2 ~ C1 and C2 = 0 so a = 2 or 3
Tree In Secondary Memory

K1 ~ K2 ~ C2 and C1 ~ 1000K1 ( in 1983 )

This gives a ~ 100

`B-Trees, (a,b)-Trees Slide # 20`
The Action is Near the Leaves

Let leaves be level 0 ( just for this slide )
Parents of leaves be level 1, ...

Theorem[2] Let b >= 2a and a >= 2. Perform any sequence of i insertions and d deletions( n = i + d ) into an initially empty ( a, b)-tree. Let
SPh = total number of node splittings at height h
Fh = total number of node fusings at height h
SHh = total number of node sharings at height h

then:
SPh + SHh + Fh <= 2( c + 2 ) n / (c + 1 ) h
where:

`B-Trees, (a,b)-Trees Slide # 21`
The Action is Near the Leaves - Who Cares?
Concurrent Databases

A = access by a seperate processor

=node locked as processor changes node

AB = access blocked by locked node

A-Sort

A-sort (next slides) uses the fact that the action is near the leaves

`B-Trees, (a,b)-Trees Slide # 22`
( a, b )-Trees and Sorting

Let x[1], x[2], ..., x[n] be a sequence to be sorted

Let f[k] = | { x[j] : j > k and x[j] < x[k] } |

Let

F is the number of inversions of x[1], x[2], ..., x[n]

Example
```1   2   7   3   4   5   9    6     8

```
has 6 inversions

Facts

1) 0 <= F <= N*(N+1)/2 for a list of N items

2) Let F = number of inversions of a list A. Insertion sort takes
( n + F ) operations to sort A

`B-Trees, (a,b)-Trees Slide # 23`

### A-Sort

Sort x[1], x[2], ..., x[n] by inserting into a ( a, b )-Tree

Insert x[1], then x[2], then x[3], ... into the tree

When inserting x[k] need to find the proper location for x[k]

Don't start the search at the root

Start the search at the "right most" internal node

This process is called A-sort

Theorem[3] A sequence of n elements with F inversions can be sorted using A-sort in:
O( n + n lg( F / n ) )

Theorem[4] A-sort is better than quicksort for list with number of inversion F <= 0.02N1.57