Duplicate

# Data Structures and Data Management

Category
Course Notes
Parent item
Sub-category
Computer Science
Sub-item
Tags
CS 240
Algorithms
Data Structures
Dictionaries
BST
AVL
search: $\theta(height)$
insert: $\theta(height)$
delete: $\theta(height)$
Worst case cost for all operations is $\theta(height) = \theta(log \space n)$
Other Dictionary Implementations
Unordered array or linked list: $\theta(1)$ insert, $\theta(n)$ search and delete
Ordered array: $\theta(log n)$ search, $\theta(n)$ insert and delete
Binary Search Trees: $\theta(height)$ search, insert, delete
Balanced BST (AVL): $\theta(log \space n)$ search, insert, and delete
Skip Lists
Expected Space: O(n)
Expected height: O(log n)
search: O(log n)
insert: O(log n)
delete: O(log n)
Dictionaries for special keys
Binary Search in a sorted array
insert: Ө(n)
delete: Ө(n)
search: Ө(log n)
Interpolation Search
$T^{(avg)}(n) \in \theta(log \space log \space n)$ if keys are uniformly distributed
Worst case: Ө(n)
Trie
Assumption: prefix-free / Based on bitwise comparisons
insert: Ө(|x|)
delete: Ө(|x|)
where |x| represents the length of binary string x, i.e., the number of bits in x
No leaf labels Trie
Halves the amount of space
Allow Proper Prefixes Trie
more space-efficient
Pruned Trie
Saves space if there are only few bitstrings that are long
Could even store infintie bitstrings (e.g. real numbers)
Must store the full keys
Compressed Trie (Patricia Tries)
delete: O(|x|)
insert: O(|x|)
Multiway Tries
O( |x| * (time to find the appropriate child) )
Array of size $|\Sigma|$ + 1 for each node
O(1) time to find child, O($|\Sigma|$n) space
List of children for each node
O($|\Sigma|$) time, O(# children) space
Dictionary (AVL?) of children for each node
O(log(# children)) time, O(# children) space
Hashing
M: hash table size
search: Ө(1)
insert: Ө(1)
delete: Ө(1)
Space: Ө(M)
Separate Chaining Hashing
insert: O(1)
search: O(1 + size of bucket)
delete: O(1 + size of bucket)
average bucket size: $\frac{n}{M} =: \alpha$ called load factor
With Uniform Hashing Assumption that each hash function value is equally likely, average-case cost of search and delete is $O(1 + \alpha)$. Otherwise, Ө(n) on average
Bucket could be implemented by any dictionary realization
Simplest approach is to use unsorted linked lists for buckets tho
Avoid the links needed for chaining by permitting only one item per slot
All operations have O(1) average-case run-time if the hash-function is uniform and $\alpha$ is kept sufficiently small. But worst-case run-time is usually Ө(n).
Linear Probing
h(k, i) = (h(k) + i) mod 11
Independent hash functions
Use multiplicative method for second hash function to make them independent
Pattern Matching
Brute-Force
Preprocessing: -
Search Time: O(nm)
Extra space: -
KMP
Preprocessing: O(m)
Search Time: O(n)
Extra space: O(m)
Boyer-Moore
Preprocessing: O(m + $|\Sigma|$)
Search Time: O(n)
Extra Space: O(m + $|\Sigma|$)
Suffix Tree Preprocessing: O($n^2$)
Search Time: O(m)
Extra space: O(n)
Huffman
frequency 낮은 순으로 합치는거
Run-Length (RLE)
반복개수로 표현
LZW
ASCII로 인코딩 후 Trie로 표현
새로운 substring은 128부터 시작
bzip2
BWT
permute 후 sort 후 마지막 캐릭터로 문장 만들기