DS C1 Trees01

Preliminaries

Definition

Recursive def of a tree:

A tree is a collection of N(N0)N(N \geq 0) nodes and edges. The collection could be empty; otherwise the tree consists of a distinguished node rr called root, and more than 00 nonempty (sub)trees T1,T2,,TkT_1, T_2, \cdots,T_k, each of whose roots are connected from rr by a directed edge.

  • parents/children: AA is the parent of BB and BB is a child of AA if there’s an edge from AA to BB.
  • siblings: AA and BB are siblings if their roots are same.
  • leaves: Node that has no children.
  • path (from n1n_1 to nkn_k) and length: path is defined as a sequence of n1,n2,,nkn_1,n_2,\cdots,n_k such that nin_i is parent of ni+1n_{i+1} for 1ik1 \leq i\leq k. The length of the path is k1k-1.
  • depth/height (of nin_i): dep(ni)=len(path(root,ni)),height(ni)=maxlleaveslen(path(ni,l))dep(n_i) = len(path(root,n_i)), \quad height(n_i) = \max_{l \in leaves}len(path(n_i,l))
  • depth/height (of tree TT): dep(T)=maxlleavesdep(l)=height(T)=height(root))dep(T) = \max_{l \in leaves}dep(l) = height(T) = height(root))
  • ancestors/descendant : AA is a ancestor of BB and BB is a descendant of AA if there’s a path from AA to BB.

NOTE The height of an empty tree is defined -1 usually.

Propositions

  1. A tree of NN nodes has N1N - 1 edges.
  2. There’s exactly 11 path from root to each node.

Implementations

1
2
3
4
5
6
typedef struct _TreeNode* PtrToNode;
typedef struct _TreeNode{
ElementType Element;
PtrToNode FirstChild;
PtrToNode NextSib;
}TreeNode;

Traversals

All traversals’ time complexity is O(N)O(N).

Preorder

1
2
3
4
5
6
7
8
Algorithm Preorder(PtrToNode T){
Visit(T);
Next := T -> FirstChild;
While(Next){
Vist(Next);
Next = Next -> NextSib;
}
}

Postorder

1
2
3
4
5
6
7
8
Algorithm Postorder(PtrToNode T){
Next := T -> FirstChild;
While(Next){
Vist(Next);
Next = Next -> NextSib;
}
Visit(T);
}

Binary Trees

1
2
3
4
5
6
typedef struct _BTreeNode* PtrToNode;
typedef struct _BTreeNode{
ElementType Element;
PtrToNode Left;
PtrToNode Right;
}BTreeNode;

Propositions

Inorder

1
2
3
4
5
Algorithm Inorder(PtrToNode T){
Visit(T -> Left);
Visit(T);
Visit(T -> Right);
}

Examples :Expression Tree

The leaves of an expression tree are operands, and the other nodes contain operators.

  • infix: traverse the tree in inoder.
  • prefix: traverse the tree in preorder.
  • post: traverse the tree in postorder.

Binary Search Tree

Properties

maxnilsubtreeval(ni)val(root)minnirsubtreeval(ni)\max_{n_i \in lsubtree} val(n_i) \leq val(root) \leq \min_{n_i \in rsubtree} val(n_i)

The inorder traversal of BST is ascending.

Operations

  • Make empty

    1
    2
    3
    4
    5
    6
    7
    8
    SearchTree MakeEmpty(SearchTree T){
    if(T != null){
    MakeEmpty(T -> Left);
    MakeEmpty(T -> Right);
    free(T);
    }
    return Null;
    }
  • Find/FindMin/FindMax

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    Position Find(ElementType X, SearchTree T){
    if(T == null)
    return null;
    if(X < T -> Element)
    return Find(X,T -> Left);
    else if(X > T -> Element)
    return Find(X,T -> Right);
    else
    return T;
    }
    Position FindMin(SearchTree T){ //leftmost
    if(T == null)
    return null;
    if(T -> Left)
    return FindMin(T -> Left);
    else
    return T;
    }
    Position FindMax(SearchTree T){//rightmost
    if(T == null)
    return null;
    if(T -> Right)
    return FindMax(T -> Right);
    else
    return T;
    }
  • Insert

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    SearchTree Insert(ElementType X,SearchTree T){
    if(T == null){
    T = malloc(sizeof(struct TreeNode));
    if(T == null) FatalError("OOS!")
    else{
    T -> Element = X;
    T -> Left = T -> Right = null;
    }
    }
    else if(X < T -> Element)
    T -> Left = Insert(X,T -> Left);
    else if(X > T -> Element)
    T -> Right = Insert(X,T -> Right);
    return T;
    }
  • Delete

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    SearchTree Delete(ElementType X,SearchTree T){
    Position TempCell;
    if(T == null){
    Error("Element not found");
    }
    else if(X < T -> Element)
    T -> Left = Delete(X,T -> Left);
    else if(X > T -> Element)
    T -> Right = Delete(X,T -> Right);
    else if(T -> Left && T -> Right){
    TempCell = FindMin(T -> Right);
    T -> Element = TempCell -> Element;
    T -> Right = Delete(TempCell -> Element,T -> Right);
    }
    else{
    TempCell = T;
    if(T -> Left == null)
    T = T -> Right;
    else
    T = T -> Left;
    free(TempCell);
    }
    return T;
    }

Average-Case Analysis

AVL Tree

AVL tree is a BST with a balance condition.

height balanced

  1. An empty binary tree is height balanced.
  2. If T is a nonempty binary tree with TLT_L and TRT_R as its left and right subtrees(could be empty), then T is height balanced iff
    • TLT_L and TRT_R are height balanced, and
    • hLhR1|h_L - h_R| \leq 1, where hLh_L and hRh_R are the heights of TLT_L and TRT_R.

The balance factor (BF)

BF(node)=hLhRBF(node) = h_L - h_R

In an AVL tree, BF(node){1,0,1}BF(node) \in \{-1,0,1\}.

Insertion and rotation

After an insertion, the nodes on the path from the insertion to the root may have their balance altered.

Hence after an insertion,

  1. follow the path up to the root and keep updating the balancing info until we met the first unbalanced node α\alpha, i.e BF(node)Z{1,0,1}BF(node) \in \mathbb{Z} - \{-1,0,1\}.

  2. Address and fix the violation for different cases.

    BF Child Child’s Subtree Solution Note
    2 Left Left single rotation LL
    2 Left Right double rotation LR
    -2 Right Left double rotation RL
    -2 Right Right single rotation RR
  3. Repeat 1 - 2 until we gets to root.

Tree rotation: an operation on a binary tree that changes the structure without interfering with the order of the elements whose time complexity is O(1)O(1).

NOTEAfter a rotation, the side of the rotation increases its height by 1 while the side opposite the rotation decreases its height by 1 similarly.

  • LL Rotation: single right rotation is enough and no more need to rotate.

  • RR Rotation: single left rotation is enough and no more need to rotate.

  • LR Rotation: take k3k_3 as the new root of the whole subtree by a left rotation and then a right rotation. This may cause new unbanlance.

  • RL Rotation: take k3k_3 as the new root of the whole subtree by a right rotation and then a left rotation. This may cause new unbanlance.

height estimation

Let nhn_h be the minimum number of nodes in a height balanced tree of height h. Then the tree must look like

nh=nh1+nh2+1n_h = n_{h - 1} + n_{h - 2} + 1

with initial conditions F0=1,F1=2F_0 = 1,F_1 = 2.

nh=Fibh+2115(1+52)h+21n_h= Fib_{h+2} - 1 \approx \frac{1}{\sqrt{5}}(\frac{1 + \sqrt{5}}{2})^{h+2} - 1

Hence

h=O(logn)h = O(\log n)

T(N)=O(h)=O(logN)T(N) = O(h) = O(\log N)

Splay Tree

A splay tree is to control the time Any M consecutive tree operations starting from an empty tree taken at most O(MlogN)O(M\log N) time.

Although this guarantee does not preclude the possibility that any single operation might take O(N)O(N) time.
But there are no bad input sequences.

Generally, when a sequence of MM operations has total worst-case running time of O(MF(N))O(MF(N)), we say that the amortized running time is O(F(N))O(F(N)).

Rotations

We will still rotate bottom up along the access path.
Let XX be a nonroot node on the access path at which we are rotating.

  1. The parent of XX is the root of the tree: merely rotate XX and the root PP.
  2. XX has both a parent (PP) and a grandparent (GG):
    • zig-zag case: XX is a right child and PP is a left child (or vice versa). Exactly same as LR rotation or RL rotation of AVL tree.
    • zig-zig case: XX and PP are either both left children or both right children. take XX as the new root of the whole subtree by a rotation between PP and GG and then a rotation between XX and PP.

Splaying not only moves the accessed node to the root, but also roughly halves the depth of most nodes on the path.

Amortized Analysis

ref Introduction to Algorithms, P451-462.

In an amortized analysis, we average the time required to perform a sequence of data-structure operations over all the operations performed.

  • Amortized analysis differs from average-case analysis in that probability is not involved;
  • amortized analysis guarantees the average performance of each operation in the worst case.

NOTE The amortized cost applies to each operation, even when there are several types of operations in the sequence.

Aggregate analysis

For all nn, a sequence of nn operations takes worst-case time T(n)T(n)in total. The amortized cost per operation is T(n)n\frac{T(n)}{n}.

Exp: MultiStack

  1. PUSH(S, x) pushes object x onto stack S.
  2. POP(S) pops the top of stack S and returns the popped object.
    err Calling POP on an empty stack.
  3. MULTIPOP(S, k) removes the k top objects of stack S.
    note popping the entire stack if the stack contains fewer than k objects; leaving the stack unchanged if k is not positive.

Then analyze a sequence of n PUSH,POP,and MULTIPOP operations on an initially empty stack. It’s easy to know that the worst case O(N)O(N) of MULTIPOP won’t exist continuously.

We can pop each object from the stack at most once for each time we have pushed it onto the stack. Therefore, the number of times that POP(including calls within MULTIPOP) can be called on a nonempty stack is at most the number of PUSH operations. Hence

T(n)=2n=O(n)T(n) = 2n = O(n)

T(n)n=O(1)\frac{T(n)}{n} = O(1)

Exp: Incrementing a binary counter

A binary number x that is stored in the counter has its lowest-order bit in A[0] and its highest-order bit in A[k - 1].

INCREMENT(A) adds 1 (modulo 2k) to the value in the counter.

1
2
3
4
5
6
i = 0;
while i < A.length and A[i] == 1
A[i] = 0;
i = i + 1;
if i < A.length
A[i] = 1;

In the worst case in which array A contains all 1s, a single execution of INCREMENT takes time Θ(k)\Theta(k) . On an initially zero counter, A[i]A[i] flips n2i\lfloor\frac{n}{2^i}\rfloor times in a sequence of n INCREMENT operations.

T(n)=i=0k1n2i<ni=0inf12i=2n=O(n)T(n) = \sum_{i = 0}^{k -1}\lfloor\frac{n}{2^i}\rfloor < n\sum_{i = 0}^{\inf}\frac{1}{2^i} = 2n = O(n)

T(n)n=O(1)\frac{T(n)}{n} = O(1)

Accounting method

Assign differing charges to different operations with some operations charged more or less than they actually cost.

amortized cost ci^\hat{c_i}: the amount we charge the operation.

credit crediticredit_i: the difference to specific objects in the data structure.

NOTECredit can help pay for later operations whose amortized costc^i\hat c_i is less than their actual costcic_i.

NOTE All operations have the same amortized cost in aggregate analysis, while they may differ from each other in accounting method.

And we require

i=1bci^i=0nci\sum_{i = 1}^{b}\hat{c_i} \ge \sum_{i = 0}^{n}{c_i}

for all sequences of n operations. i.e

i=1ncrediti=i=1nci^i=0nci0\sum_{i = 1}^{n}credit_i = \sum_{i = 1}^{n}\hat{c_i}-\sum_{i = 0}^{n}{c_i}\ge 0

The total credit associated with the data structure must be nonnegative at all times. Or the total amortized cost would not be an upper bound on the total actual cost!

Exp: MultiStack

Op cic_i ci^\hat{c_i}(designed) crediticredit_i
PUSH 1 2 1
POP 1 0 -1
MULTIPOP k 0 -k

i=1ncrediti=i=1nci^i=0nci0\sum_{i = 1}^{n}credit_i = \sum_{i = 1}^{n}\hat{c_i}-\sum_{i = 0}^{n}{c_i}\ge 0

The sum of the credit equals the number of objects in the stack, which is nonnegative. Hence

TAmortized(n)=O(1)T_{Amortized}(n) = O(1)

Exp: Incrementing a binary counter

Charge an amortized cost of 2 to set a bit to 1, 1 to pay for the actual setting of the bit, and another to be used later when the bit is flipped back to 0.

And charge nothing to reset a bit to 0.

i=1ncrediti=i=1nci^i=0nci0\sum_{i = 1}^{n}credit_i = \sum_{i = 1}^{n}\hat{c_i}-\sum_{i = 0}^{n}{c_i}\ge 0

The sum of the credit equals the number of 1s in the counter, which is nonnegative.

TAmortized(n)=O(1)T_{Amortized}(n) = O(1)

Potential method

Perform n operations starting with an initial data structure D0D_0. For each i=1,2,ni = 1, 2,\cdots n:

  • cic_i: actual cost of the ithi^{th} operation

  • DiD_i: the data structure that results after applying the ithi^{th} operation to data structure Di1D_{i-1}

  • Φ\Phi : a potential function maps each DiD_i to a real number Φ(Di)\Phi(D_i), which is the potential associated with DiD_i.

    ci^=ci+crediti=ci+Φ(Di)Φ(Di1)\hat{c_i} = c_i + credit_i = c_i + \Phi(D_i) - \Phi(D_{i - 1})

  • ci^\hat{c_i} : the amortized cost of the ithi^{th} operation.

The amortized cost is

i=1nci^=i=0n(ci+Φ(Di)Φ(Di1))=i=0nci+Φ(Dn)Φ(D0)i=0nci\begin{align} \sum_{i = 1}^{n}\hat{c_i} &= \sum_{i = 0}^{n}(c_i + \Phi(D_i) - \Phi(D_{i - 1}))\\ &= \sum_{i = 0}^{n}c_i + \Phi(D_n) - \Phi(D_0) \ge \sum_{i = 0}^{n}c_i \end{align}

Hence the only requirements for potential method is

Φ(Dn)Φ(D0)0\Phi(D_n) - \Phi(D_0) \ge 0

for all sequences of n operations.

A good potential function should always assume its minimum at the start of the sequence.

Exp: MultiStack

Define the potential function Φ\Phi on a stack to be the number of objects in the stack.

Φ(D0)=0\Phi(D_0) = 0

Φ(Dn)=size(S)\Phi(D_n) = size(S)

Hence,

Op cic_i Φ(Di)Φ(Di1)\Phi(D_i) - \Phi(D_{i - 1}) ci^\hat{c_i}
PUSH 1 1 2
POP 1 -1 0
MULTIPOP k -k 0

Exp: Incrementing a binary counter

Define the potential function Φ\Phi on a stack to be the number of 1s in the counter.

Φ(D0)=0\Phi(D_0) = 0

Φ(Dn)=numof1(A)\Phi(D_n) = numof1(A)

Hence, assume mim_i 1s are set to 0 in a single execution,

Op cic_i Φ(Di)Φ(Di1)\Phi(D_i) - \Phi(D_{i - 1}) ci^\hat{c_i}
INCREMENT 1 + mim_i 1 - mim_i 2

Exp: Splay Tree

The potential function:

functions NOTEs
Φ(Di)=iDiheight(i)\Phi(D_i) = \sum_{i \in D_i}{height(i)} Almost every node’s height changes after a rotation!
Φ(Di)=iDisize(i)\Phi(D_i) = \sum_{i \in D_i}size(i) The difference is too large, causing a loose bound!
Φ(Di)=iDilogsize(i)=iDiRank(i)\Phi(D_i) = \sum_{i \in D_i}\log size(i) = \sum_{i \in D_i}Rank(i) Great !

lemma If a+bc,a,b,cZ+a + b ≤ c, a,b,c \in \mathbb{Z}+, then

loga+logb2logc2\log a + \log b \le 2\log c - 2

PF:

aba+b2c2abc24loga+logb2logc2log2=2logc2\begin{align} \sqrt{ab} &\le \frac{a + b}{2} \le \frac{c}{2} \\ ab &\le \frac{c^2}{4}\\ \log a + \log b &\le 2\log c - 2\log 2 = 2\log c - 2 \end{align}

For Zig:

ci^=1+R2(X)R1(X)+R2(P)R1(P)=1+R2(P)R1(X)1+R2(X)R1(X)\begin{align} \hat{c_i} &= 1 + R_2(X) - R1(X) + R_2(P) - R_1(P)\\ &= 1 + R_2(P) - R_1(X)\\ &\le1 + R_2(X) - R_1(X) \end{align}

For Zig-zag:

ci^=2+R2(X)R1(X)+R2(P)R1(P)+R2(G)R1(G)=2R1(X)R1(P)+(R2(G)+R2(P))2R1(X)R1(P)+2R2(X)22R2(X)2R1(X)\begin{align} \hat{c_i} &= 2 + R_2(X) - R1(X) + R_2(P) - R_1(P) + R_2(G) - R_1(G)\\ &= 2 - R_1(X) - R_1(P) + (R_2(G)+ R_2(P))\\ &\le 2 - R_1(X) - R_1(P) + 2R_2(X) - 2\\ &\le 2R_2(X) - 2R_1(X) \end{align}

For Zig-zig:

ci^=2+R2(X)R1(X)+R2(P)R1(P)+R2(G)R1(G)=22R1(X)+(R1(X)+R2(G))R1(P)+R2(P)22R1(X)+2R2(X)2R1(P)+R2(P)3R2(X)3R1(X)\begin{align} \hat{c_i} &= 2 + R_2(X) - R1(X) + R_2(P) - R1(P) + R_2(G) - R_1(G)\\ &= 2 - 2R_1(X)+( R_1(X) + R_2(G) )- R1(P)+ R_2(P)\\ &\le 2 - 2R1(X) + 2R2(X) - 2 - R1(P)+ R_2(P)\\ &\le 3R_2(X) - 3R_1(X) \end{align}

Hence the amortized time of the insertion of k rotations is

i=1kci^3(R(T)R(X))+13(logNlog1)+1=3logN+1=O(logN).\sum_{i = 1}^{k}\hat{c_i} \le 3( R( T ) – R ( X ) ) + 1 \le 3(\log N - \log 1) + 1 = 3\log N + 1 = O(\log N).

The rotation of zig happens once at most .


DS C1 Trees01
http://example.com/2023/03/07/DS-01/
Author
Tekhne Chen
Posted on
March 7, 2023
Licensed under