Computer Science Cafe: data structure

Showing posts with label data structure. Show all posts

Wednesday, January 19, 2011

Graph Algorithms

Minimum Spanning Tree

Problem definition
In a weighted connected graph G(V,E,W) Find tree T that contains all the vertices in G and minimizes the sum of weight of all edges
Greedy approach to find minimum spanning tree,at each step one of the several possible choices is made that best choice

Safe edge is one that can be added to subset of edges of minimum spanning tree A without violating the condition
A cut is any partition of V
If no edge in A crosses the cut then it respects the cut
An edge is light edge crossing the cut if its weight is minimum of any edge crossing the cut

Theorem In graph G if we partition set V into S and V-S and A is subset of edges of minimum spanning tree then a light edge {u,v} can be added to set A safely that is AU{u,v} is also subset of MST
If edge u,v is not part of MST then there must be some other edge (xy) connecting the two partitions V and V-S so if we add edge uv to it, it should create cycle but we have uv as light edge so
w(uv)<=w(xy)
so we have MST with uv also
But if all edges have distinct weights we cannot have this condition and only edge {u,v} can be part of MST

Generic- MST : Find a safe edge and add it to A to form MST

Kruskal algorithm
greedy algorithm because at each step adds the least possible weight edge to A.
set A in case of Kruskal algorithm is forest
It finds a safe edge to add to the growing forest by finding edge {u,v} of least weight that connects any two trees in forest

Kruskal algorithm
Kruskal(G,w)

For each vertex v in V(G)
make-set(v)   //O(V)
sort the edges of G in nondecreasing order by weight   //greedy approach to take minimum weight
For each edge E{u,v} in the sorted list
if both endpoints do not belong to same tree then //if belongs to same tree then we would have cycle
If Find-Set(u) != Find-Set(v)          //O(E) for each edge
then A =A U {u,v} // add edge E to set A
Union(u,v)

we used disjoint set data structure to implement it

Running time
depends on the disjoint set operations
time to sort the edges takes (E log E)
creating a disjoint set of vertices that is that many number of distinct trees initially and the main loop to process each edge takes O(E+V)a(V), but since G is connected graph E >= V-1 and so it becomes O(E)a(V) (a takes into count the union at last when vertices are added to tree) where a(V) is (log V) = O(log E)
Total running time is O(E log E) also |e|<=|v|^2 so log E = O(log V) and we can also state running time as
O(E log V)

Prim Algorithm
also example greedy algorithm
edges in the set A forms Tree that is it works by adding edges to a set A
operates like Dijkstra algorithm
It gradually creates a minimum tree that spans all the verties
At each step light edge is added to tree A that connects A to an isolated vertex
key concern is how to select
min priority queue is used to hold vertices not in tree A based on key field .For each vertex v, key[v] is the minimum weight of any edge connecting v to a vertex in the tree

Prim Algorithm
r is the starting vertex
At each step we update the adjacent veritces weight so that when we call extract-min on priority queue we get the light edge that is minimum weight edge
Prim(G,w,r)
for each vertex add set the key field to infinity and parent field to null except for the vertex r for which key field is set 0
set the min priority queue with this information
while queue is no empty
u = extract-min (Q)            // we process vertex in order on min key in Q
for each edge v adjacent to u
   if v is in the priority queue and weight w(u,v) < key[v]
    update the key and parent for vertex u

For every vertex v in Q, key[v] is the minimum weight of any edge connecting v to a vertex in the tree
Q will have non infinite key for those vertexes whose adjacent vertex has been processed or in other words we determine the cut of the graph and light edge is added to the set A.
set A and Q forms disjoint sets of cut.

Loop invariants
set A contains V-Q that is vertices processed and removed from queue
for all vertices in queue if parent is not null then key of vertie

Running time of prim algorithm
depends on how min priority queue is implemented
1)If binary min heap is used
we use build min heap to create vertex with key and parent fields it will takes O(V) time
Extract min takes O(log V) //when selecting the vertex in queue
and is called V times that is for each vertex so extract min takes total of O(V logV)
Decrease key on binary heap = O(log V) // when updating the key of adjacent vertices
and is called for all the adjacent vertices of v
sum of lenght of all adjacency list is 2|E| so decrease key is called O(E logV)
total time is O(V log V + E log V )
when graph is sparse we have e << v^2 and e = Th(v) so cost is O(V log V)
when graph is dense we have e ~= v^2, so cost is O(V^2 log V)
2)using fibonaci heap
extract min can be performed in O(log V)
and decrease key can be performed in O(1)
so total time is (VlogV+E)
for sparse we have e=Th(v) so cost is O(VlogV)
for dense we have e ~= v^2 so cost is O(V^2)
3)using adjacency matrix
scans the list to find the smallest key   O(V)
for each vertex update its adjacent vertex weight   //O(deg(u))
total cost is {sum over all vertices} O(V + deg[u])
so total cost is O(V^2 + E) =O(V^2)

Negative edges in Kruskal and Prim algorithm
Negative edges does not affects the kruskal algorithm Both works correctly as should be
Light edge donot distinguish negative and positive edges

Shortest Path Problem

There is only one minimum spanning tree but in general there is different shortest path for each source

Optimal structure
each subpath of shortest path is shortest path

Cycle in graph
negative weight edges
there may be negative weight edges
If G contains no negative weight edges then shortest path weight remains well defined for all sources s
If there is negative weight cycle reachable from s then shortest path are not well defined
If there is negative weight cycle on some path from s to v then shortest path weight d(s,v) = -infinity

If vertex v is not reachable from s than d(s,v) = + infinity

Dijkstra algorithm requires no negative weight edge in the input
Bellman Ford algorithm allows negative weight edge in input graph and produces correct answer as long as no negative weight cycles are reachable from source. It can detect the negative weight edge cycles and reprot there existence

Shortest path cannot contain negative edge cycles and also not positive weight edge cycles
zero weight edge cycle it can be on the shortest path but then we can also go via another path instead of zero cycle
So we will consider shortest path only for v-1 edges that is without any cycle

Relaxation step
Relax(u,v,w) //relax edge {uv} having weight w
we update the distance on the vertex v like
if d(v) > d(u) +w(uv)
then update d(v) with d(u) + w(uv)

Each algorithm differs in how many times and in which order they relax edges.
Dijkstra algorithm relaxes each edge exactly once
In Bellman ford each edge is relaxed many times

Bellmand Ford Algorithm
solves single source shortest path in weighted directed graph G(V,E)
negative weight edges allowed
detects reachable negative weight edge cycles
algorithm returns boolean value indicating whether or not negative weight edge cycle exist If such cycle exists the algorithm indicates there is no solution If there is no cycle it produces shortest path and weight

algorithm return TRUE if and only if the graph contains no negative weight cycle reachable from source s

Bellman-Ford(G,w,s)
Initialize(G,s ) //initializes the d(v) for each vertex to infinity and path s to zero
for i =1 to |V|-1 //remaining vertices other than source
for each edge (u,v) in E(G) //each pass relaxes each of the edge of the graph so |v-1|*|E| relaxation
    Relax(u,v,w)
for each edge (u,v) in E(G)   //check negative weight cycle
check if d(v) > d(u) + w(u,v)
    return FALSE              //if negative weight cycle exist
return TRUE

Running Time
Initialization takes O(V)
we relax each edge for each vertex that is O(VE)
check for negative cycle takes O(E)
total running time is O(VE)

For proving the correctness of the bellman ford we need to prove that each vertex v has shortest distance from s that is for each vertex v d[v]=shortest distance from s

Observations
does not always returns shortest path from s to t in all the graphs, when graph have negative weight cycle
If graph has all distinct vertices then shortest path between two vertices may not be unique, same weight path possible

Dijkstra algorithm
single source shortest path on directed weighted graph
no negative edges allowed
example of greedy algorithm
greedy algorithms doesnot always yields optimal result but in case of Dijkstra it return optimal weight path
maintains set S of vertices whose final shortest path weight from source have already been determined
select vertex u whith minimum path estimate from remaining vertices and adds it to S and relaxes all edges leaving u

Dijkstra (G,w,s)
Initialize all vertices to infinity distance
build a min priority queue from vertex set V[G] //Q would always contain V-S
while priority queue is not empty    //runs for |V| times
u = extract-min(Q)
add u to S
for each vertex v adjacent to u
relax(u,v,w)

Running Time of Dijkstra
maintains min prioity queue by calling three operations
Insert(build heap) ,Extract-min(G) and Decrease-Key(Relax)
1)we take advantage of the vertices being numbered 1 to |V|-1
we can perform Insert and Decrease-key in O(1) and extract min in O(V) time
total running time O(V^2+E) = O(V^2)
2)If graph is sparse that is E =o(V^2/log V) we implement it using binary min heap
build heap O(V)
extract min O(log V) and there are V such extract so O(V log V)
Decrease key O(log V) , number of decrease key is of O(E) so total O(ElogV)
Total running time = O((V+E)log V) = O(E log V)
3)with fibonacci heap we can achieve (V logV+E)

Observations
Dijkstra algorithm relaxes each edges only once
example when Dijkstra algorithm does not works
here we start from A relax edge AB and AC
B gets labeled (5,A) and C(10,A) So we choose B and relax edge BD, D gets new label (6,B)
D is processed then C is chosen and edge CB is relaxed and B gets new label (3,C)
D has label (6,B) while D should have been (4,B)

All pair Shortest Path

Breadth First Search
for searching graph
given graph G and source s BFS explores the edges of G to discover every vertex that is reachable from s
It computes the smallest number of edges from s
explores all vertices at distnace k from s before moving to k+1 vertices

To keep track of progress, colors each vertex white gray or black.
not processed white
when first encountered it becomes non white
three color for vertices
white -- not discovered yet
gray discovered but not processed
black all the adjacent vertices discovered and grayed

all vertices adjacent to black are discovered vertices
vertices adjacent to gray may not be discovered

constructs a breadth first tree initially containing only its root

FIFO queue used to process vertices in order in which they are discovered

BFS(G,s)
color each vertex white initially , their distance infinity
color sources s as gray that is processing starts with s
enqueue s
for each vertex in queue
u = dequeue(Q)
for each adjacent vertex v of u
if its not discovered yet that is color ==white
color it gray
increase weight by one d(v) = d(u)+1
enqueue(Q, u)
color v black that is all its adj have been discovered

Queue Q always consists of gray vertices

Running time
aggregate analysis
Enqueue and Dequeue O(1) time
each vertex is enqueued and dequeued once giving O(V)
since sum of lenght of all adjacency list is Th(E)
total time scanning adjacency list is O(E)

total running time of BFS is O(V+E)

breadth first search shortest path, that is, minimum number of edges or weight of 1 on each edge, for each vertex
if it is reachable else infinity

DFS
edges are explored out of most recently discovered vertex
when all edge o v has been discovered search backtracks to explore edges leaving the vertex from which v was discovered
predecessor graph of DFS may be composed of several trees unlike that or BFS

like BFS vertices are colored to indicate their state
white initially grayed when discovered and black when finished

DFS(G)
initialize each vertex color to white
time =0
for each vertex in V[G]
if we have white colored vertex call DFS-VISIT(u)

DFS-VISIT(u)
set color of u to gray indicating its discovered
time = time+1
d[u] =time // time when vertex u is discovered
for each vertex v adjacent to u
if it is white colored
then call DFS-VISIT(v)
set color of u to black indicating all adjacent discovered
f[u] = time+1 //when the search or u adjacent vertices finished

Running time
since DFSVISIT is called exactly once for each vertex it takes O(V) time
scanning adjacency list O(E)
total O(V+E)

properties of DFS
vertex v is a decendant of vertex u in the depth first forest if and only if v is discovered during the time in which u is gray

discovery and finishing time have discovery structure
parenthesis theorem
that either interval [d[u],[u]] and [d[v],f[v]] are entirely disjoint or one is completely contained in other

vertex v is proper decendant of vertex u in the depth first forest for a graph G if and only if
d[u] <> j & not edge of graph

Problem Remove cycles from graph O(k) cycles
Solution When traversing the graph check whether edge trying to relax goes to node that has been seen but not finished that is gray colored vertex then this is back edge and we can save all such edges and at last remove them It will take time of DFS algo that is O(V+E)

Observation
DFS-VISIT if runs on BST it would return the smallest element as its first node and root as its last
Only in case of DFS does absence of back edges ensures acyclic graph not in case of BFS
BFS finds path using fewest number of edges the BFS depth of any vertex is at least as small as DFS depth of the same vertex Thus DFS tree has greater or equal depth that is if maximum distance between two vertices is T edges then in BFS depth is at most T but depth of DFS might be larger,DFS may have depth up to V-1 like in case of complete graph
If adjacency matrix is used instead of adjacency list in BFS it would run in O(V^2) instead of O(V+E)

dynamic programming solution
characterize structure of optimal solutiom
recursively define value of optimal soluion
computer in bottom up

Optimal sub structure
subpath of shotest path are also shortest paths
that is if p is shortest path from i to j then
d(i,j) = d(i,k)+w(i,k)

Recursive solution
let l^m(i,j) be minimum weight of any path from i to j with at most m edges
when m =0
l^m(i,j) = 0 if i=j
= infinity if i<>j
for m >= 1
we compute l

Wednesday, December 15, 2010

Abstract Datatype

Linked list
struct node{
int node;      //datapart
struct* node; //pointer to next node structure
}
There is one head pointer pointing to node structure but treated specially Its allocated on stack while these nodes are stored on heap with dynamic memory allocation call to malloc(int)

isEmpty(struct node*)
Input: heap pointer
If head pointer is null
true
else
false

int Length(struct node * header)
Input head pointer
create another duplicate pointer instead of changing the head pointer because we donot want to lose start of list call it current pointer to struct node
here header and current pointers are local to this function only

while(!isEmpty)
count++
current= current ->next

InserAfter ( struct node ** head pointer, newdata) //insert new element at beginning of list
// head pointer by reference instead by value
allocate newNode on heap
struct node * newNode = malloc(sizeof(struct node));

here head points to the actual head pointer instead to first node

set data newNode->data = newdata
set next pointer link of newNode to remaining list newNode ->next = * head
set head of list to newNode , if we had passed copy of pointer we wouldn't be able to change it
*head = newNode
we change the head pointer only at last because we donot want to lose
call to it would be like InsertFirst(&head,data)
where head in the calling program would be defined like struct node * head

changing a pointer with reference pointer
we pass a pointer with one extra star like we did above struct node **

Common mistakes in Linked list
1)passing pointer by value when updating the head in the function
2)

Deleting Max from linked list
can be done in single pass if we maintain the maxprevious else would need two pass
Binary search on linked list
would take O(nlogn) which is more than on array(logn)

remove duplicates from the unsorted array and leave the order of numbers same
like 2 1 8 2 7 4 8 should be 2 1 8 7 4
can be done in O(nlogn) if we attach a key with all the numbers to keep track of order
first we sort according to value then O(nlogn)
remove duplicates O(n) leave keys with smaller number
sort according to keys O(nlogn)

size of stack when function is should depend on the depth of function calls
that is for permutation number of call is proportional to n
for fact also number of calls is proportional to n
when there is function call then address and variable are piled on stack and new function is free to change the registers
so when permutation(10) is called it will pile the address and variable for the main function
and it will call permute reursively until permute(1) in which it doesnot need to call any function
_________
main call     |
-------------
permute 10 |
------------
permute9    |
------------
________
permute2   |
------------

so 9 is the depth for the function and first call piles main block address and register values on stack

Polish Notation
prefix notation
operator before operands

Reverse Polish Notation
every operator follows its operand

Infix to postfix conversion
values are moved to output queue
when operator has higher precedence than operator on top of stack push it on to stack
when operator have same precedence as top of stack and its left associative pop top of stack and push new operator
when operator has same precedence and is evaluated from right push it
when left parenthesis is found push onto stack
when right parentesis is found pop stack top to output until left parenthesis found and discard both left and right

stack error
underflow trying to pop empty stack and overflow trying to push onto already full
array can be shared to allow two stack implemented in same
Queue

FIFO
The queue has head and tail
element is added at tail and removed from head
head(Q) indexes head of the queue that is the fist element in fifo sequence
tail(Q) points to next location at which new data will be inserted, one ahead of last lement
we wrap around when tail reaches the end of the queue + 1

Enqueue operation inserts element at the tail and moves tail ahead by one to point to next empty
Dequeue operation removes from the head and moves head to point to next element

In circular queue when head points to the first element and tail points to the last element there is problem to check the empty and full condition because when
tail = front - 1 it can be empty or can be full
solution to this problem involves using additional varible which keeps count of number of elements or
keeping a gap between head and tail by one null space

Stack with one queue (circular)
operation we need push and pop
order we need is LIFO
adding element to queue should not be problem because we can enqueue element at tail now to delete
we can delete from front only by dequeue operation so we need to move all elements after rear that is dequeue each element until last, remaining list will be in same order as before

Queue with two stacks
Enqueue operation is simple
push the new element on top of stack1
Its of O(1)

Dequeue operation
we can pop only top of the stack which is last element but for dequeue we need the last element of the stack that is first element inserted so we have to pop(O(1)) all the element from stack1(O(n)) and push(O(1)) them on to stack2 then pop the first element from the stack2 and again push all remaining element onto stack1
Time complexity is O(n)

Hash Table
data structure for implementing dictionary(Insert,delete search)
Worst can is O(n) for searching but normally it achieves O(1)

Direct Address Table
works well when the number of distinct keys are small
element is stored at location indexed by its key
that is element a with key k is stored in table T as address T(k)
Dictionary operations take O(1) time

Hash table
element a with key k is stored at location h(k) where h is hash function
hash function maps the universe of keys to the slots of the table T
here we are reducing range of array indices which in the previous case was T[U] and in this case would be
T[1...m]
two keys may hash to same slot, that is,h[k2]=h[k3] can happen this is called collision

Collision Resolution by chaining
We put all the elements that hash to same slot in linked list
search in this would be like : check the list T[h(k)] for element, time complexity depends on the length of list
Insertion : insert at head of list T[h(k)]   O(1) in worst case
Deletion: given an element x to delete from table T
For doubly linked list its O(1) we can set the predecessor pointer without traversing
For single inked list it would take traversing to find the predecessor to set

Consider Hash table T with m slots that stores n elements
load factor a = n / m , average number of elements stored in a chain
Worst case
when all elements maps to same index in table and creates a long chain
for searching it would be then Th(n) + time to compute hash function

Average case
we consider simple uniform hashing
Probability that two keys would hash to same location is equally likely that is 1/m

searching would take O(a+1)

when number of elements n in the table is proportional to hash table slots that is
n = O(m)
a= O(m)/m = O(1)
and searching would take O(1)

Problem what is expected number of collisions for a pair of distinct keys, assuming uniform hashing.
Solution
there are total C(n 2) key pairs
For each pair
probability of collision P(c)= 1/m because of uniform hashing
X be the random variable which is equal to 1 in case of collision that is when (h(k1) = h(k2))
we can find the Expected value for collision over all keys
C(n 2)(1*p(c))
= n(n-1)/2m

Hash Function
has function should be independent of the keys
Division method
h(k) = k mod m
fast only single division is required
If m is power of 2, 2^p than h(k) would consider only p lower order bits of k its not good choice
A prime not too close to power of 2 is good choice

Multiplication Method

Open addressing
no linked list for each slot only, single element in each slot
load factor =1
avoids any pointers that memory is used in more slots
we compute the sequence of nodes to examine instead of
Insertion
we find an empty slot by probing
the probe order is not be sequential it depends on the key
for each key we have a probe sequence like
{h(k,1),h(k,2) ..........h(k,m-1)}
this probe sequence should be permutation of 1 2 3 m-1 m so that all slots are considered

Hash-Insert(T,k)
for each probe sequence j = h(k,i)
if no element at that location insert it
else probe next

search probes the same sequence as used in the insertion
Has-Search(T,k)
search all probe sequence until k is found
we can improve this by making delete function specially so that we stop search when we find first null in the probe sequence

Delete
when we delete an element we mark it with something instead of assigning null

collision technique is preferred when keys must be deleted that is more efficient than open addressing
For uniform hashing each permutation generated by probing sequence should be equally likely
any probing technique which generates near to m! probing sequence is better

Linear Probing
probing sequence is linear that is h'(k) h'(k)+1 .... m-1 0 1 2 .. k-1
that is
h(k,i) = (h'(k) +i )mod m
for i = 0 1 2 ....m-1
there are m different probe sequence possible.
easy to implement but primary clustering the series of occupied slots will increase in order, increasing the search time

Quadratic Probing
h(k,i) = (h'(k)+ c1i+c2i^2)
first slot probed is h'(k)

Friday, December 3, 2010

Heap Datastructure and Algorithms

Heap Sort

running time is o(nlogn)
in place sorting algorithm unlike merge sort
not stable sort
uses max heap, largest element is first element in array and smallest element is in one of leaf nodes

Heap datastructure

nearly complete binary tree, completely filled on all level except possibly lowest which is filled from left to right
Number of elements in heap of height should be between 2^h and 2^(h+1)-1 or
height of n element heap = ceil[logn] from above equation
since its almost complete binary tree parent child relation can be found in array with simple calculations like parent(i) = floor(i/2) , left(i) = 2i and right(i) = 2i+1

Type of binary heap

max heap, every node i has A[parent(i)] >= A[i]
and min heap, every node i has A[parent(i)] <= A[i]

Operations on heap data structure

Max-Heapify(A,i)-----log(n)

also called percolate down
Given a tree that is heap except for node i, arranges node i and its subtree
assumes that binary tree rooted at left(i) and right(i) are max heap
A[i] is placed at its right position
Algorithm
at each step find largest of A[i] A[left(i)] A[right(i)] and store it in largest
If A[i] is largest no need to go further
Else swap the A[larger] with A[i] and call max heapify on (A,larger)

Analysis
The worst case of max heapify occurs when last level is half full
Find relation between level and total number of nodes n for worst case scenario
elements up to i-1 level = 2^0 + 2^1 + ....2^i-1 = 2ⁱ -1
elements at last level = 2^i /2
Total number of nodes = (2^0 + 2^1 + ....2^i-1) + 2^i/2
2ⁱ + 2^i-1 = n+1
2^i-1(2+1)=n+1;
2^i-1=(n+1)/3

Max subtree size = (half of all elements up to level i-1) + (elements at the last level) – (1 root element) =
Max subtree size = (2^i - 1)/2 + 2^(i-1) - 1
substituting value from above equation
we get total number of nodes ~( 2n/3)

Recurrence relation
Time to run max heapify = time to fix immediate child heap property and to call max heapify on subtree
worst case when subtree is of size 2n/3 Therefore
T(n) <= T(2n/3) +Th(1)
solution to this recurrence is O(logn)
When max heapify is called on node of height h its running time is O(h)

Build Heap -----O(n)

already have all the elements in array, build heap from that
In the array representation for storing n element heap leaves are indexed by
floor(n/2)+1, floor(n/2)+2 ....n that is there are ceil(n/2) leaves
we call max heapify for each node starting from last parent element

Algorithm

Build Max Heap (A, n)
for each element from floor(n/2) down to first
Max-Heapify(A,i)

Analysis
Relaxed Upper bound Each max heapify takes time log n and there are almost n calls so O(nlogn)
Tight upper bound Since max heapify takes time proportional to height of node on which its called and that varies for each call to it in above loo.
total cost = sum over all height(0 to h){ (number of nodes at height x) O(x)}
which comes out to be O(n)
when root node contains smallest value it will be swapped with every node at each iteration

number of nodes at height < = ceil(n/2^(h+1))
we know that node at height h takes log h time for maxheapify
total cost = sum over all height(0 to h){ (number of nodes at height x) O(x)}
Cost of heapifying all = {sum o to log n}(n/2^(h+1)* O(h)) ~= O(n)

height and level of tree difference

level of tree is calculated as distance from root
height of tree is calculated as distance from leaf
height of tree might not be same for node at some level or depth this could be the case in case of not complete binary tree.
max height of heap is floor(log n)

Heap Sort
first we build the max heap using previous algorithm
since first element is the largest element we can place in its proper position at end of array

Heapsort (A,n)
Build-Max-Heap(A,n) //first build heap
for i = n to 2
     exchange A[1] and A[i] //at each iteration we get the largest element at first index and replace with last
     heap-size[A] = heap-size[A]-1 //no need to count the already sorted elements placed at end of array
     Max-Heapify(A,1) // after exchange it might not be heap anymore

Analysis
Since building max heap takes O(n) time and
there are n-1 call to max-heapify procedure that is O(nlogn)
so total cost is O(nlogn)

Insert(A,a)
Inserts an element in already established heap
put the new element at last position and percolate up that is find the right position by comparing with parent
and sibling
worst case time of O(log n)

Min Max Heap
It is possible to get both min and max in time O(logn)

Heap Implementation
1)binary tree
finding adjacents in last level is problem while adding
2)array
no space required for pointers
allows heapsort to be inplace
requires allocating array before using it so cannot be used for priority queue where number of task are not known before

Problem

Is the sequence <23, 17, 14, 6, 13, 10, 1, 5, 7, 12> a max-heap?
Solution Not a max heap

Problem Time cost for Heap - Delete, which deletes a root element
Solution
We delete the element i and swap it with last element and call max heapify for this exchanged node
so it run in time of max heapify that is O(lg n)

Problem Merge k sorted list with heap sort.
Solution:
Assume total number of elements as n
1) build a min heap with first elements of each list, heap would contain k elements// cost O(k)
2) while heap not empty // there are k element in heap
Extract min from heap and append to output
If successor of output element != nil, call min heapify for this element //cost O(lg k)
At a time there are max k elements in heap, successor of element is next element in its sorted list.
since we have total of n elements its O(nlogk)
and space will be O(k)

Problem Analyze the heapsort on already sorted array i)acending ii)decending
Solution
Heapsort Algorithm includes two things
1)build max heap // called n/2 times
2)exchange first element in array with last and call max heapify // called n-1 times

Consider Heapsort algorithm called on list already sorted in
i) descending order
Build Max heap is called on n/2 non leaf elements of the list and it need to go thru each even though they are sorted so it takes O(n).
max heapify is called n-1 times for list of n items But since max heapify replaces the last item with the first it changes the order of the list and it needs to be heapified each time which cost Th(log n)
so total cost becomes Th(n logn)
ii) ascending order
Build max heap will take time more than previous case but still O(n)
max heapify will still take Omega(logn) for each element
So total cost is Omega(nlogn)

Problem Insertion into Max heap with binary search
Consider max heap is stored as an array we insert the new element in the last position and then call heapify for nodes on its path to root which will take O(logn) time but what if we find the position of new element by binary search on its path to root, the number of nodes to compare would be O(lgn) and binary search will make Th(lg(lg n)) comparisons but then after finding the right position we need to shift the elements on its path downwards which will be of O(lgn) so again it becomes O(lg(lgn)+lgn ) = O(lgn)

Problem Given an array of n elements find i largest elements
Solution Build max heap of n elements // cost O(n)
Extract maximum element of heap i times // each extract takes O(lg n) time
Total cost = O(n+i lgn)
for i < n this cost is asymtotically less than O(nlogn)

Pages