Friday, December 3, 2010

Heap Datastructure and Algorithms

 Heap Sort
  •  running time is o(nlogn)
  •  in place sorting algorithm unlike merge sort
  •  not stable sort
  •  uses max heap, largest element is first element in array and smallest element is in one of leaf nodes
Heap datastructure
  •  nearly complete binary tree, completely filled on all level except possibly lowest which is filled from left to right
  • Number of elements in heap of height should be between 2^h and 2^(h+1)-1  or 
  • height of n element heap = ceil[logn] from above equation
  •  since its almost complete binary tree parent child relation can be found in array with simple calculations like parent(i) = floor(i/2) , left(i) = 2i and right(i) = 2i+1

Type of binary heap
max heap, every node i has A[parent(i)] >= A[i]
and min heap, every node i has A[parent(i)] <= A[i]

Operations on heap data structure
Max-Heapify(A,i)-----log(n)
also called percolate down
Given a tree that is heap except for node i, arranges node i and its subtree
assumes that binary tree rooted at left(i) and right(i) are max heap
 A[i] is placed at its right position
Algorithm
at each step find largest of A[i] A[left(i)] A[right(i)] and store it in largest
If A[i] is largest no need to go further
Else swap the A[larger] with A[i] and call max heapify on (A,larger)

Analysis
The worst case of max heapify occurs when last level is half full
Find relation between level and total number of nodes n for worst case scenario
elements up to i-1 level = 2^0 + 2^1 + ....2i-1 = 2i -1
elements at last level = 2^i /2
Total number of nodes = (2^0 + 2^1 + ....2^i-1) + 2^i/2
2i + 2i-1 = n+1
2i-1(2+1)=n+1;
2i-1=(n+1)/3

Max subtree size = (half of all elements up to level i-1) + (elements at the last level)  – (1 root element) =
Max subtree size = (2^i - 1)/2 + 2^(i-1) - 1
substituting value from above equation
we get  total number of nodes  ~( 2n/3)


Recurrence relation 
Time to run max heapify = time to fix immediate child heap property and to call max heapify on subtree
worst case when subtree is of size 2n/3 Therefore
T(n) <= T(2n/3) +Th(1)
solution to this recurrence is O(logn)
When max heapify is called on node of height h its running time is O(h)

Build Heap -----O(n)
already have all the elements in array, build heap from that
In the array representation for storing n element heap leaves are indexed by
floor(n/2)+1, floor(n/2)+2 ....n that is there are ceil(n/2) leaves
we call max heapify for each node starting from last parent element

Algorithm 

Build Max Heap (A, n)
     for each element from floor(n/2) down to first
           Max-Heapify(A,i)

Analysis
Relaxed Upper bound  Each max heapify takes time log n and there are almost n calls so O(nlogn)
Tight upper bound Since max heapify takes time proportional to height of node on which its called and that varies for each call to it in above loo.
total cost = sum over all height(0 to h){ (number of nodes at height x) O(x)}
which comes out to be O(n)
when root node contains smallest value it will be swapped with every node at each iteration

number of nodes at height < =  ceil(n/2^(h+1))
we know that node at height h takes log h time for maxheapify
total cost = sum over all height(0 to h){ (number of nodes at height x) O(x)}
Cost of heapifying all = {sum o to log n}(n/2^(h+1)* O(h)) ~= O(n)


height and level of tree difference 
level of tree is calculated as distance from root
height of tree is calculated as distance from leaf
height of tree might not be same for node at some level or depth this could be the case in case of not complete binary tree.
max height of heap is floor(log n)


Heap Sort 

first we build the max heap using previous algorithm
since first element is the largest element we can place in its proper position at end of array

Heapsort (A,n)
Build-Max-Heap(A,n)  //first build heap
for i = n to 2
     exchange A[1] and A[i]  //at each iteration we get the largest element at first index and replace with last
     heap-size[A] = heap-size[A]-1 //no need to count the already sorted elements placed at end of array
     Max-Heapify(A,1) // after exchange it might not be heap anymore


Analysis
Since building max heap takes O(n) time and
there are n-1 call to max-heapify procedure that is O(nlogn)
so total cost is O(nlogn)

Insert(A,a)
Inserts an element in already established heap
put the new element at last position and percolate up that is find the right position by comparing with parent
and sibling
worst case time of O(log n)


Min Max Heap
It is possible to get both min and max in time O(logn)

Heap Implementation
1)binary tree
finding adjacents in last level is problem while adding
2)array
no space required for pointers
allows heapsort to be inplace
requires allocating array before using it so cannot be used for priority queue where number of task are not known before



Problem 
Is the sequence <23, 17, 14, 6, 13, 10, 1, 5, 7, 12> a max-heap?
Solution Not a max heap


Problem Time cost for Heap - Delete, which deletes a root element
Solution
We delete the element i and swap it with last element and call max heapify for this exchanged node
so it run in time of max heapify that is O(lg n)

Problem Merge k sorted list with heap sort.
Solution:
Assume total number of elements as n
1) build a min heap with first elements of each list, heap would contain k elements// cost O(k)
2) while heap not empty  // there are k element in heap
        Extract min from heap and append to output
        If successor of output element != nil, call min heapify for this element //cost O(lg k)
At a time there are max k elements in heap, successor of element is next element in its sorted list.
since we have total of n elements its O(nlogk)
and space will be O(k)

Problem Analyze the heapsort on already sorted array i)acending ii)decending
Solution
Heapsort Algorithm includes two things
1)build max heap // called n/2 times
2)exchange first element in array with last and call max heapify // called n-1 times

Consider Heapsort algorithm called on list already sorted in
i) descending order
Build Max heap is called on n/2 non leaf elements of the list and it need to go thru each even though they are sorted so it takes O(n).
max heapify is called n-1 times for list of n items But since max heapify replaces the last item with the first it changes the order of the list and it needs to be heapified each time which cost Th(log n)
so total cost becomes Th(n logn)
ii) ascending order
Build max heap will take time more than previous case but still O(n)
max heapify will still take Omega(logn) for each element
So total cost is Omega(nlogn)

Problem Insertion into Max heap with binary search
Consider max heap is stored as an array we insert the new element in the last position and then call heapify for nodes on its path to root which will take O(logn) time but what if we find the position of new element by binary search on its path to root, the number of nodes to compare would be O(lgn) and binary search will make Th(lg(lg n)) comparisons but then after finding the right position we need to shift the elements on its path downwards which will be of O(lgn) so again it becomes O(lg(lgn)+lgn ) = O(lgn)

Problem Given an array of n elements find i largest elements
Solution Build max heap of n elements // cost O(n)
Extract maximum element of heap i times // each extract takes O(lg n) time
Total cost = O(n+i lgn)
for i < n this cost is asymtotically less than O(nlogn)

No comments:

Post a Comment