Dictionaries

BST

AVL

search: $\theta(height)$

insert: $\theta(height)$

delete: $\theta(height)$

Worst case cost for all operations is $\theta(height) = \theta(log \space n)$

Other Dictionary Implementations

Dictionary ADT

•

Unordered array or linked list: $\theta(1)$ insert, $\theta(n)$ search and delete

•

Ordered array: $\theta(log n)$ search, $\theta(n)$ insert and delete

•

Binary Search Trees: $\theta(height)$ search, insert, delete

•

Balanced BST (AVL): $\theta(log \space n)$ search, insert, and delete

Skip Lists

Expected Space: O(n)

Expected height: O(log n)

search: O(log n)

insert: O(log n)

delete: O(log n)

Dictionaries for special keys

Binary Search in a sorted array

insert: Ө(n)

delete: Ө(n)

search: Ө(log n)

Interpolation Search

$T^{(avg)}(n) \in \theta(log \space log \space n)$ if keys are uniformly distributed

Worst case: Ө(n)

Trie

Assumption: prefix-free / Based on bitwise comparisons

insert: Ө(|x|)

delete: Ө(|x|)

where |x| represents the length of binary string x, i.e., the number of bits in x

No leaf labels Trie

Halves the amount of space

Allow Proper Prefixes Trie

more space-efficient

Pruned Trie

Saves space if there are only few bitstrings that are long

Could even store infintie bitstrings (e.g. real numbers)

Must store the full keys

Compressed Trie (Patricia Tries)

delete: O(|x|)

insert: O(|x|)

Multiway Tries

O( |x| * (time to find the appropriate child) )

•

Array of size $|\Sigma|$ + 1 for each node

◦

O(1) time to find child, O($|\Sigma|$n) space

•

List of children for each node

◦

O($|\Sigma|$) time, O(# children) space

•

Dictionary (AVL?) of children for each node

◦

O(log(# children)) time, O(# children) space

Hashing

Direct Addressing

M: hash table size

search: Ө(1)

insert: Ө(1)

delete: Ө(1)

Space: Ө(M)

Separate Chaining Hashing

insert: O(1)

search: O(1 + size of bucket)

delete: O(1 + size of bucket)

average bucket size: $\frac{n}{M} =: \alpha$ called load factor

With Uniform Hashing Assumption that each hash function value is equally likely, average-case cost of search and delete is $O(1 + \alpha)$. Otherwise, Ө(n) on average

Bucket could be implemented by any dictionary realization

Simplest approach is to use unsorted linked lists for buckets tho

Open Addressing

Avoid the links needed for chaining by permitting only one item per slot

All operations have O(1) average-case run-time if the hash-function is uniform and $\alpha$ is kept sufficiently small. But worst-case run-time is usually Ө(n).

Linear Probing

h(k, i) = (h(k) + i) mod 11

Independent hash functions

Use multiplicative method for second hash function to make them independent

Pattern Matching

Brute-Force

Preprocessing: -

Search Time: O(nm)

Extra space: -

KMP

Preprocessing: O(m)

Search Time: O(n)

Extra space: O(m)

Boyer-Moore

Preprocessing: O(m + $|\Sigma|$)

Search Time: O(n)

Extra Space: O(m + $|\Sigma|$)

Suffix Tree
Preprocessing: O($n^2$)

Search Time: O(m)

Extra space: O(n)

Huffman

frequency 낮은 순으로 합치는거

Run-Length (RLE)

반복개수로 표현

LZW

ASCII로 인코딩 후 Trie로 표현

새로운 substring은 128부터 시작

bzip2

BWT

permute 후 sort 후 마지막 캐릭터로 문장 만들기