Wednesday, January 19, 2011

Graph Algorithms

Minimum Spanning Tree
Problem definition
In a weighted connected graph G(V,E,W) Find tree T that contains all the vertices in G and minimizes the sum of weight of all edges
Greedy approach to find minimum spanning tree,at each step one of the several possible choices is made that best choice

Safe edge is one that can be added to  subset of edges of minimum spanning tree A without violating the condition
A cut is any partition of V
If no edge in A crosses the cut then it respects the cut
An edge is light edge crossing the cut if its weight is minimum of any edge crossing the cut

Theorem In graph G if we partition set V into S and V-S and A is subset of edges of minimum spanning tree then a light edge {u,v} can be added to set A safely that is AU{u,v} is also subset of MST
If  edge u,v is not part of MST then there must be some other edge (xy) connecting the two partitions V  and V-S so if we add edge uv to it, it should create cycle but we have uv as light edge so
w(uv)<=w(xy)
so we have MST with uv also
But if all edges have distinct weights we cannot have this condition and only edge {u,v} can be part of MST

Generic- MST : Find a safe edge and add it to A to form MST

Kruskal algorithm
greedy algorithm because at each step adds the least possible weight edge to A.
set A in case of Kruskal algorithm is forest
It finds a safe edge to add to the growing forest by finding edge {u,v} of least weight that connects any two trees in forest

Kruskal algorithm
Kruskal(G,w)

For each vertex v in V(G)
  make-set(v)   //O(V)
sort the edges of G in nondecreasing order by weight   //greedy approach to take minimum weight
For each edge E{u,v} in the sorted list
if both endpoints do not belong to same tree then  //if belongs to same tree then we would have cycle
If Find-Set(u)  != Find-Set(v)          //O(E) for each edge
 then A =A U {u,v} // add edge E to set A
 Union(u,v)

we used disjoint set data structure to implement it

Running time
depends on the disjoint set operations
time to sort the edges takes (E log E)
creating a disjoint set of vertices that is that many number of distinct trees initially and the main loop to process each edge takes O(E+V)a(V), but since G is connected graph E >= V-1 and so it becomes O(E)a(V) (a takes into count the union at last when vertices are added to tree) where a(V) is (log V) = O(log E)
Total running time is O(E log E) also |e|<=|v|^2  so log E = O(log V) and we can also state running time as
O(E log V)


Prim Algorithm
also example greedy algorithm 
edges in the set A forms Tree that is it works by adding edges to a set A
operates like Dijkstra algorithm
It gradually creates a minimum tree that spans all the verties
At each step light edge is added to tree A that connects A to an isolated vertex
key concern is how to select
min priority queue is used to hold vertices not in tree A based on key field .For each vertex v, key[v] is the minimum weight of any edge connecting v to a vertex in the tree

Prim Algorithm
r is the starting vertex
At each step we update the adjacent veritces weight so that when we call extract-min on priority queue we get the light edge that is minimum weight edge
Prim(G,w,r)
for each vertex add set the key field to infinity and parent field to null except for the vertex r for which key field is set 0
set the min priority queue with this information
while queue is no empty
  u =  extract-min (Q)            // we process vertex in order on min key in Q
  for each edge v adjacent to u
   if v is in the priority queue and weight w(u,v) < key[v]  
    update the key and parent for vertex u

For every vertex v in Q, key[v] is the minimum weight of any edge connecting v to a vertex in the tree
Q will have non infinite key for those vertexes whose adjacent vertex has been processed or in other words we determine the cut of the graph and light edge is added to the set A.
set A and Q forms disjoint sets of cut.


Loop invariants
set A contains V-Q that is vertices processed and removed from queue
for all vertices in queue if parent is not null then key of vertie

Running time of prim algorithm
depends on how min priority queue is implemented
1)If binary min heap is used
we use build min heap to create vertex with key and parent fields it will takes O(V) time
 Extract min takes O(log V)  //when selecting the vertex in queue
and is called V times that is for each vertex  so extract min takes total of O(V logV)
Decrease key on binary heap = O(log V)  // when updating the key of adjacent vertices
and is called for all the adjacent vertices of v
sum of lenght of all adjacency list is 2|E| so decrease key is called O(E logV)
total time is O(V log V + E log V )
when graph is sparse we have e << v^2 and e = Th(v) so cost is O(V log V)
when graph is dense we have e ~= v^2, so cost is O(V^2 log V)
2)using fibonaci heap
extract min can be performed in O(log V)
and decrease key can be performed in O(1)
so total time is (VlogV+E)
for sparse we have e=Th(v) so  cost is O(VlogV)
for dense we have e ~= v^2 so cost is O(V^2)
3)using adjacency matrix
scans the list to find the smallest key   O(V)
 for each vertex update its adjacent vertex weight   //O(deg(u))
total cost is {sum over all vertices} O(V + deg[u])
so  total cost is O(V^2 + E) =O(V^2)

Negative edges in Kruskal and Prim algorithm
Negative edges does not affects the kruskal algorithm Both works correctly as should be
Light edge donot distinguish negative and positive edges


Shortest Path Problem
There is only one minimum spanning tree but in general there is different shortest path for each source

Optimal structure
each subpath of shortest path is shortest path

Cycle in graph
negative weight edges
there may be negative weight edges
If G contains no negative weight edges then shortest path weight remains well defined for all sources s
If there is negative weight cycle reachable from s then shortest path are not well defined
If there is negative weight cycle on some path from s to v then shortest path weight d(s,v) = -infinity

If vertex v is not reachable from s than d(s,v) = + infinity

Dijkstra algorithm requires no negative weight edge in the input
Bellman Ford algorithm allows negative  weight edge in input graph and produces correct answer as long as no negative weight cycles are reachable from source. It can detect the negative weight edge cycles and reprot there existence

Shortest path cannot contain negative edge cycles and also not positive weight edge cycles
zero weight edge cycle it can be on the shortest path but then we can also go via another path instead of zero cycle
So we will consider shortest path only for v-1 edges that is without any cycle

Relaxation step
Relax(u,v,w)  //relax edge {uv} having weight w
we update the distance on the vertex v like
if d(v) > d(u) +w(uv)
then update d(v) with d(u) + w(uv)

Each algorithm differs in how many times and in which order they relax edges.
Dijkstra algorithm relaxes each edge exactly once
In Bellman ford each edge is relaxed many times

Bellmand Ford Algorithm
solves single source shortest path in weighted directed graph G(V,E)
negative weight edges allowed
detects reachable negative weight edge cycles
algorithm returns boolean value indicating whether or not negative weight edge cycle exist If such cycle exists the algorithm indicates there is no solution If there is no cycle it  produces shortest path and weight

algorithm return TRUE if and only if the graph contains no negative weight cycle reachable from source s

Bellman-Ford(G,w,s)
Initialize(G,s )  //initializes the d(v) for each vertex to infinity and path s to zero
for i =1 to |V|-1  //remaining vertices other than source
  for each edge (u,v) in E(G)  //each pass relaxes each of the edge of the graph so |v-1|*|E| relaxation
    Relax(u,v,w)
for each edge (u,v) in E(G)   //check negative weight cycle
 check if d(v) > d(u) + w(u,v) 
    return FALSE              //if negative weight cycle exist
return TRUE

Running Time
Initialization takes O(V)
we relax each edge for each vertex that is O(VE)
check for negative cycle takes O(E)
total running time is O(VE)

For proving the correctness of the bellman ford we need to prove that each vertex v has shortest distance from s that is for each vertex v d[v]=shortest distance from s

Observations
does not always returns shortest path from s to t in all the graphs, when graph have negative weight cycle
If graph has all distinct vertices then shortest path between two vertices may not be unique, same weight path possible


Dijkstra algorithm
single source shortest path on directed weighted graph
no negative edges allowed
example of greedy algorithm
greedy algorithms doesnot always yields optimal result but in case of Dijkstra it return optimal weight path 
maintains set S of vertices whose final shortest path weight from source have already been determined
select vertex u whith minimum path estimate from remaining vertices and adds it to S and relaxes all edges leaving u

Dijkstra (G,w,s)
Initialize all vertices to infinity distance
build a min priority queue from vertex set V[G] //Q would always contain V-S
while priority queue is not empty    //runs for |V| times
 u = extract-min(Q)
 add u to S
 for each vertex v adjacent to u
  relax(u,v,w)

Running Time of Dijkstra
maintains min prioity queue by calling three operations
Insert(build heap) ,Extract-min(G) and Decrease-Key(Relax)
1)we take advantage of the vertices being numbered 1 to |V|-1
we can perform Insert and Decrease-key in O(1) and extract min in O(V) time
total running time O(V^2+E) = O(V^2)
2)If graph is sparse that is E =o(V^2/log V) we implement it using binary min heap
build heap O(V)
extract min O(log V) and there are V such extract so O(V log V)
Decrease key O(log V) , number of decrease key is of O(E) so total  O(ElogV)
Total running time = O((V+E)log V) = O(E log V)
3)with fibonacci heap we can achieve (V logV+E)

Observations
Dijkstra algorithm relaxes each edges only once
example when Dijkstra algorithm does not works
here we start from A relax edge AB and AC
B gets labeled (5,A) and C(10,A) So we choose B and relax edge BD, D gets new label (6,B)
D is processed then C is chosen and edge CB is relaxed and B gets new label (3,C)
D has label (6,B) while D should have been (4,B)


All pair Shortest Path

Breadth First Search
for searching graph
given graph G and source s BFS explores the edges of G to discover every vertex that is reachable from s
It computes the smallest number of edges from s
explores all vertices at distnace k from s before moving to k+1 vertices

To keep track of progress, colors each vertex white gray or black.
not processed white
when first encountered it becomes non white
three color for vertices
white -- not discovered yet
gray discovered but not processed
black all the adjacent vertices discovered and grayed


all vertices adjacent to black are discovered vertices
vertices adjacent to gray may not be discovered

constructs a breadth first tree initially containing only its root

FIFO queue used to process vertices in order in which they are discovered

BFS(G,s)
color each vertex white initially , their distance infinity
color sources s as gray that is processing starts with s
enqueue s
for each vertex in queue
u = dequeue(Q)
for each adjacent vertex v of u
if its not discovered yet that is color ==white
color it gray
increase weight by one d(v) = d(u)+1
enqueue(Q, u)
color v black that is all its adj have been discovered

Queue Q always consists of gray vertices


Running time
aggregate analysis
Enqueue and Dequeue O(1) time
each vertex is enqueued and dequeued once giving O(V)
since sum of lenght of all adjacency list is Th(E)
total time scanning adjacency list is O(E)

total running time of BFS is O(V+E)

breadth first search shortest path, that is, minimum number of edges or weight of 1 on each edge, for each vertex
if it is reachable else infinity

DFS
edges are explored out of most recently discovered vertex
when all edge o v has been discovered search backtracks to explore edges leaving the vertex from which v was discovered
predecessor graph of DFS may be composed of several trees unlike that or BFS

like BFS vertices are colored to indicate their state
white initially grayed when discovered and black when finished

DFS(G)
initialize each vertex color to white
time =0
for each vertex in V[G]
if we have white colored vertex call DFS-VISIT(u)

DFS-VISIT(u)
set color of u to gray indicating its discovered
time = time+1
d[u] =time // time when vertex u is discovered
for each vertex v adjacent to u
if it is white colored
then call DFS-VISIT(v)
set color of u to black indicating all adjacent discovered
f[u] = time+1 //when the search or u adjacent vertices finished


Running time
since DFSVISIT is called exactly once for each vertex it takes O(V) time
scanning adjacency list O(E)
total O(V+E)

properties of DFS
vertex v is a decendant of vertex u in the depth first forest if and only if v is discovered during the time in which u is gray

discovery and finishing time have discovery structure
parenthesis theorem
that either interval [d[u],[u]] and [d[v],f[v]] are entirely disjoint or one is completely contained in other


vertex v is proper decendant of vertex u in the depth first forest for a graph G if and only if
d[u] <> j & not edge of graph








 Problem Remove cycles from graph O(k) cycles 

Solution When traversing the graph check whether edge trying to relax goes to node that has been seen but not finished that is gray colored vertex then this is back edge and we can save all such edges and at last remove them It will take time of DFS algo that is O(V+E)


 Observation
DFS-VISIT if runs on BST it would return the smallest element as its first node and root as its last
Only in case of DFS does absence of back edges ensures acyclic graph not in case of BFS
BFS finds path using fewest number of edges the BFS depth of any vertex is at least as small as DFS depth of the same vertex Thus DFS tree has greater or equal depth that is if maximum distance between two vertices is T edges then in BFS depth is at most T but depth of DFS might be larger,DFS may have depth up to V-1 like in case of complete graph
If adjacency matrix is used instead of adjacency list in BFS it would run in O(V^2) instead of O(V+E)



dynamic programming solution
characterize structure of optimal solutiom
recursively define value of optimal soluion
computer in bottom up


Optimal sub structure
subpath of shotest path are also shortest paths
that is if p is shortest path from i to j then
d(i,j) = d(i,k)+w(i,k)

Recursive solution
let l^m(i,j) be minimum weight of any path from i to j with at most m edges
when m =0
l^m(i,j) = 0 if i=j
= infinity if i<>j
for m >= 1
we compute l

6 comments:

  1. A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.
    website: geeksforgeeks.org

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.
    website: geeksforgeeks.org

    ReplyDelete
  4. A Computer Science portal for geeks. It contains well written, well thought and well
    explained computer science and programming articles, quizzes and practice/competitive
    programming/company interview Questions.
    website: geeksforgeeks.org

    ReplyDelete
  5. A Computer Science portal for geeks. It contains well written, well thought and well
    explained computer science and programming articles, quizzes and practice/competitive
    programming/company interview Questions.
    website: geeksforgeeks.org

    ReplyDelete
  6. A Computer Science portal for geeks. It contains well written, well thought and well
    explained computer science and programming articles, quizzes and practice/competitive
    programming/company interview Questions.
    website: geeksforgeeks.org

    ReplyDelete