SDSU CS660 Combinatorial Algorithms
Fall Semester, 1996
B-Trees, (a,b)-Trees

[To Lecture Notes Index]
San Diego State University -- This page last updated Oct 10, 1996
----------

Contents of B-Trees, (a,b)-Trees

  1. References
  2. B-Trees of degree t
    1. Insertion in a B-Tree
    2. Deletion in a B-Tree
    3. B-Trees and Red-Black Trees
  3. ( a, b )-Trees
    1. Insertion
    2. Deletion
    3. Amorized Cost
    4. A-Sort

References


Introduction to Algorithms, Chapter 19

Data Structures and Algorithms 1: Sorting and Searching, Mehlhorn, sections 5.2, 5.3

B-Trees, (a,b)-Trees Slide # 1

B-Trees of degree t



A tree T is a B-Trees of degree t if

a) All leaves of T have the same depth

b) All internal nodes of T except the root we have:
t<= c(v) <= 2t

c) The root of T satisfies 2 <= c(v) <= 2t

c(v) = number of children of node v

B-Trees, (a,b)-Trees Slide # 2
Other Definitions of B-Tree

All internal nodes of T except the root we have:
t <= c(v) <= 2t


All internal nodes of T except the root have between
t-1 and 2t-1 keys


All internal nodes of T except the root we have:
t+1 <= c(v) <= 2t + 1


All internal nodes of T except the root we have:
t/2 <= c(v) <= t


All internal nodes of T except the root we have:
<= c(v) <= t


B-Trees, (a,b)-Trees Slide # 3
Theorem. If n >= 1, then for any n-key B-tree T of height h and degree t >= 2 then

proof.
so

take log of both sides.
How many levels?
tN# of Levels
25633,000,0004
2568,550,000,0005
1284,100,0004
128530,000,0005


B-Trees, (a,b)-Trees Slide # 4
Theorem. The worst case search time on a n-key B-tree T of degree t is O(lg(n)).
A node in T has t-1 <= K <= 2t-1 keys in sorted order.
Worst case:
K = t-1 for all nodes
searching for X not in the tree
Given a node, W, in T, how much work does it take to find the subtree of W that would contain X?
Using binary search it takes
= = comparisons
Since the height of the tree is in worst case the total amount of work is:


B-Trees, (a,b)-Trees Slide # 5

Insertion in a B-Tree

Inserting X into B-tree T of degree t

A full node is one that contains 2t-1 keys

1. Find the leaf that should contain X

2. If the path from the root to the leaf contains a full node, then split the node when you first search it.
Example t = 2, Insert 25
Full Node is split, Then insert 25 into subtree b

3. Insert X into the proper leaf

B-Trees, (a,b)-Trees Slide # 6
Example t = 2, Insert 25

B-Trees, (a,b)-Trees Slide # 7

Deletion in a B-Tree

Deleting X from B-tree T of degree t

A minimal node is one that contains t-1 keys and is not the root

In the search path from the root to node containing X, if you come across a minimal node add a key to it.


Case 3. Searching node W that does not contain X. Let c be the child of W that would contain X.

Case 3a. if c has t-1 keys and a sibling has t or more keys, steal a key from the sibling
Example t = 2, Delete 250


B-Trees, (a,b)-Trees Slide # 8
Case 3b. if c has t-1 keys and all siblings have t-1 keys, merge c with a sibling
Example 1. t = 2, Delete 250
Example 2. t = 2, Delete 250

B-Trees, (a,b)-Trees Slide # 9
Case 2. Internal node W contains X.

Case 2a. If the child y of W that precedes X in W has at least t keys, steal predecessor of W
Example 1. t = 2, Delete 50
Now Delete 45w

B-Trees, (a,b)-Trees Slide # 10
Case 2b. If the child z of W that succeed X in W has at least t keys, steal the successor of W
Example 1. t = 2, Delete 30
Now Delete 40w

B-Trees, (a,b)-Trees Slide # 11
Case 2c. If both children z and y of W that succeed (follow) X in W have only t-1 keys, merge z and y
Example t = 2, Delete 30
Now Delete 30w one lower level

Case 1. X is in node W a leaf. By case 3, W has at least t keys. Remove X from W

B-Trees, (a,b)-Trees Slide # 12

B-Trees and Red-Black Trees


Theorem. A Red-Black tree is a B-Tree with degree 2

proof:

Must show:
1. If a node is red, then both its children are black
2. Every simple path from a node to a descendant leaf contains the same number of black nodes


B-Trees, (a,b)-Trees Slide # 13
Leaf-Oriented Storage

Data is stored in leaves. Internal nodes are used to index into leaves.
Node-Oriented Storage
Leaf-Oriented Storage


B-Trees, (a,b)-Trees Slide # 14

( a, b )-Trees


Will assume items of interest are stored in the leafs, but this is not required

Leaf contains one key

Internal nodes contain keys used to find leafs


Let a and b be integers with a >= 2 and 2a-1 <= b. A tree T is an (a, b)-tree if

a) All leaves of T have the same depth

b) All internal nodes v of T except the root satisfy a <= c(v) <= b

c) The root of T satisfies 2 <= c(v) <= b

c(v) = number of children of node v



B-Trees, (a,b)-Trees Slide # 15

Insertion

(2, 4) Tree Insert 6
Find Proper Leaf Location and Add
If needed Split Node



B-Trees, (a,b)-Trees Slide # 16

Deletion

Delete 6
Find and Delete leaf, Shrink Parent
If needed either fuse parent or


B-Trees, (a,b)-Trees Slide # 17
Share nodes from Sibling



Let T be an ( a, b )-tree with n leaves and height h. Then:

a)

b) lg (n)/lg (b) <= h <= 1 + lg( n/2 ) / log ( a )



B-Trees, (a,b)-Trees Slide # 18

Amorized Cost


Theorem[1] Let b >= 2a and a >= 2. Perform any sequence of i insertions and d deletions( n = i + d ) into an initially empty ( a, b)-tree. Let
SP = total number of node splittings
F = total number of node fusings
SH = total number of node sharings

then:

SP + F + SH = O( n ).


This is not true when b = 2a - 1. That is for certain definitions of B-trees!

B-Trees, (a,b)-Trees Slide # 19
Values of a and b?

Assume b = 2a

Assume it costs C1 + C2m time units to move m contiguous elements from secondary to main memory

C1 = latency time, C2 = time to move one storage location

Assume it costs K1 + K2n to determine the subtree of interest in a node containing n keys

Total search time in (a , b ) -tree will be bound by
Time to search one node * number of levels
( K1 + K2a + C1 + C2a ) lg( n ) / lg( a )

This is minimal when
a* ln( a - 1 ) = ( K1 + C1 ) / ( K2 + C2 )

Tree In Main Memory

K1 ~ K2 ~ C1 and C2 = 0 so a = 2 or 3
Tree In Secondary Memory

K1 ~ K2 ~ C2 and C1 ~ 1000K1 ( in 1983 )

This gives a ~ 100

B-Trees, (a,b)-Trees Slide # 20
The Action is Near the Leaves

Let leaves be level 0 ( just for this slide )
Parents of leaves be level 1, ...


Theorem[2] Let b >= 2a and a >= 2. Perform any sequence of i insertions and d deletions( n = i + d ) into an initially empty ( a, b)-tree. Let
SPh = total number of node splittings at height h
Fh = total number of node fusings at height h
SHh = total number of node sharings at height h

then:
SPh + SHh + Fh <= 2( c + 2 ) n / (c + 1 ) h
where:



B-Trees, (a,b)-Trees Slide # 21
The Action is Near the Leaves - Who Cares?
Concurrent Databases

A = access by a seperate processor

=node locked as processor changes node

AB = access blocked by locked node

A-Sort

A-sort (next slides) uses the fact that the action is near the leaves

B-Trees, (a,b)-Trees Slide # 22
( a, b )-Trees and Sorting

Let x[1], x[2], ..., x[n] be a sequence to be sorted

Let f[k] = | { x[j] : j > k and x[j] < x[k] } |

Let

F is the number of inversions of x[1], x[2], ..., x[n]

Example
1   2   7   3   4   5   9    6     8

has 6 inversions

Facts

1) 0 <= F <= N*(N+1)/2 for a list of N items

2) Let F = number of inversions of a list A. Insertion sort takes
( n + F ) operations to sort A


B-Trees, (a,b)-Trees Slide # 23

A-Sort


Sort x[1], x[2], ..., x[n] by inserting into a ( a, b )-Tree

Insert x[1], then x[2], then x[3], ... into the tree

When inserting x[k] need to find the proper location for x[k]

Don't start the search at the root

Start the search at the "right most" internal node

This process is called A-sort


Theorem[3] A sequence of n elements with F inversions can be sorted using A-sort in:
O( n + n lg( F / n ) )

Theorem[4] A-sort is better than quicksort for list with number of inversion F <= 0.02N1.57

----------