Depth-First Search and Breadth-First Search in Python
Graph theory and in particular the graph ADT (abstract data-type) is widely explored and implemented in the field of Computer Science and Mathematics. Consisting of vertices (nodes) and the edges (optionally directed/weighted) that connect them, the data-structure is effectively able to represent and solve many problem domains. One of the most popular areas of algorithm design within this space is the problem of checking for the existence or (shortest) path between two or more vertices in the graph. Properties such as edge weighting and direction are two such factors that the algorithm designer can take into consideration. In this post I will be exploring two of the simpler available algorithms, Depth-First and Breath-First search to achieve the goals highlighted below:
- Find all vertices in a subject vertices connected component.
- Return all available paths between two vertices.
- And in the case of BFS, return the shortest path (length measured by number of path edges).
The Graph
So as to clearly discuss each algorithm I have crafted a connected graph with six vertices and six incident edges. The resulting graph is undirected with no assigned edge weightings, as length will be evaluated based on the number of path edges traversed. There are two popular options for representing a graph, the first being an adjacency matrix (effective with dense graphs) and second an adjacency list (effective with sparse graphs). I have opted to implement an adjacency list which stores each node in a dictionary along with a set containing their adjacent nodes. As the graph is undirected each edge is stored in both incident nodes adjacent sets.
graph = 'A': set(['B', 'C']), 'B': set(['A', 'D', 'E']), 'C': set(['A', 'F']), 'D': set(['B']), 'E': set(['B', 'F']), 'F': set(['C', 'E'])>
Looking at the graph depiction below you will also notice the inclusion of a cycle, by the adjacent connections between ‘F’ and ‘C/E’. This has been purposely included to provide the algorithms with the option to return multiple paths between two desired nodes.
Depth-First Search
The first algorithm I will be discussing is Depth-First search which as the name hints at, explores possible vertices (from a supplied root) down each branch before backtracking. This property allows the algorithm to be implemented succinctly in both iterative and recursive forms. Below is a listing of the actions performed upon each visit to a node.
- Mark the current vertex as being visited.
- Explore each adjacent vertex that is not included in the visited set.
Connected Component
The implementation below uses the stack data-structure to build-up and return a set of vertices that are accessible within the subjects connected component. Using Python’s overloading of the subtraction operator to remove items from a set, we are able to add only the unvisited adjacent vertices.
def dfs(graph, start): visited, stack = set(), [start] while stack: vertex = stack.pop() if vertex not in visited: visited.add(vertex) stack.extend(graph[vertex] - visited) return visited dfs(graph, 'A') #
The second implementation provides the same functionality as the first, however, this time we are using the more succinct recursive form. Due to a common Python gotcha with default parameter values being created only once, we are required to create a new visited set on each user invocation. Another Python language detail is that function variables are passed by reference, resulting in the visited mutable set not having to reassigned upon each recursive call.
def dfs(graph, start, visited=None): if visited is None: visited = set() visited.add(start) for next in graph[start] - visited: dfs(graph, next, visited) return visited dfs(graph, 'C') #
Paths
We are able to tweak both of the previous implementations to return all possible paths between a start and goal vertex. The implementation below uses the stack data-structure again to iteratively solve the problem, yielding each possible path when we locate the goal. Using a generator allows the user to only compute the desired amount of alternative paths.
def dfs_paths(graph, start, goal): stack = [(start, [start])] while stack: (vertex, path) = stack.pop() for next in graph[vertex] - set(path): if next == goal: yield path + [next] else: stack.append((next, path + [next])) list(dfs_paths(graph, 'A', 'F')) # [['A', 'C', 'F'], ['A', 'B', 'E', 'F']]
The implementation below uses the recursive approach calling the ‘yield from’ PEP380 addition to return the invoked located paths. Unfortunately the version of Pygments installed on the server at this time does not include the updated keyword combination.
def dfs_paths(graph, start, goal, path=None): if path is None: path = [start] if start == goal: yield path for next in graph[start] - set(path): yield from dfs_paths(graph, next, goal, path + [next]) list(dfs_paths(graph, 'C', 'F')) # [['C', 'F'], ['C', 'A', 'B', 'E', 'F']]
Breath-First Search
An alternative algorithm called Breath-First search provides us with the ability to return the same results as DFS but with the added guarantee to return the shortest-path first. This algorithm is a little more tricky to implement in a recursive manner instead using the queue data-structure, as such I will only being documenting the iterative approach. The actions performed per each explored vertex are the same as the depth-first implementation, however, replacing the stack with a queue will instead explore the breadth of a vertex depth before moving on. This behavior guarantees that the first path located is one of the shortest-paths present, based on number of edges being the cost factor.
Connected Component
Similar to the iterative DFS implementation the only alteration required is to remove the next item from the beginning of the list structure instead of the stacks last.
def bfs(graph, start): visited, queue = set(), [start] while queue: vertex = queue.pop(0) if vertex not in visited: visited.add(vertex) queue.extend(graph[vertex] - visited) return visited bfs(graph, 'A') #
Paths
This implementation can again be altered slightly to instead return all possible paths between two vertices, the first of which being one of the shortest such path.
def bfs_paths(graph, start, goal): queue = [(start, [start])] while queue: (vertex, path) = queue.pop(0) for next in graph[vertex] - set(path): if next == goal: yield path + [next] else: queue.append((next, path + [next])) list(bfs_paths(graph, 'A', 'F')) # [['A', 'C', 'F'], ['A', 'B', 'E', 'F']]
Knowing that the shortest path will be returned first from the BFS path generator method we can create a useful method which simply returns the shortest path found or ‘None’ if no path exists. As we are using a generator this in theory should provide similar performance results as just breaking out and returning the first matching path in the BFS implementation.
def shortest_path(graph, start, goal): try: return next(bfs_paths(graph, start, goal)) except StopIteration: return None shortest_path(graph, 'A', 'F') # ['A', 'C', 'F']
Resources
- Depth-and Breadth-First Search
- Connected component
- Adjacency matrix
- Adjacency list
- Python Gotcha: Default arguments and mutable data structures
- PEP 380
- Generators
- Developer at MyBuilder
- Three Devs and a Maybe podcast co-host
- All ramblings can be found in the Archive
Breadth First Search using Python
Breadth-First Search Algorithm or the BFS Algorithm is a graph traversal algorithm that is used to traverse Graph elements in an orderly fashion. Using BFS, one can visit all the nodes in a graph that are reachable from the starting node.
To implement the BFS algorithm, a visited array is maintained which is used to check whether or not all the nodes of the graph have been visited.
An queue is used to implement the BFS algorithm by pushing each into the queue and then popping it while visiting it. After a particular node is visited, its children are added to the queue to traverse the following level and so on.
Step 1: Consider the Graph to be traversed
Step 2: Pushing the first Node into the Queue
Step 3: Popping the first element, visiting it ( red ) and adding its children ( green ) to the Queue
Step 4: Popping one by one and adding their children
Step 5
Step 6
Step 7: The final traversed graph is
The time complexity of the BFS algorithm is O(V + E) where V is the total number of vertices, and E is the total number of edges.
Breadth-First Search Algorithm
Assumptions : ‘queue’ is a standard queue with standard ‘push’ and ‘pop’ operations. ‘visited’ is an array recording the visited nodes. ‘graph’ is a standard graph data structure with alpha-numeric elements with standard graph methods. ‘start’ is the node, the BFS algorithm is supposed to start with.
BFS( arguments: start )
- Start
- If graph has Node start then
- call push( parameters: start )
- While queue is not empty do
- visiting
- Print visiting, » was visited»
- visited[ visiting ]
- cd
- i
- While i
- If cd[ i ] in visited then
- Continue
- call push( parameters: cd[ i ] )
- Print «Node not found»
BFS Algorithm using Python
The BFS algorithm can be implemented in Python using the object-oriented feature of Python. Here, a class called ‘Traversal’ is created and the BFS algorithm is implemented as a class method.
Here, the Graph class is pre-defined and its standard methods are implemented and used by the BFS algorithm.
Graphs
The Python 3 code to implement the BFS algorithm is as follows- Defining the Traversal class to house the BFS algorithm and its allied methods
import graph class Traversal(object):
def __init__(self): self.G = graph.create() self.visited = <> self.structure = [] self.top = -1 for i in self.G.adjList.keys(): self.visited[i] = 0
def clear(self): self.visited = <> self.structure = [] for i in self.G.adjList.keys(): self.visited[i] = 0
def push(self, node): self.structure.append(node) self.top = self.top + 1
def pop(self): if self.structure == []: return False else: temp = self.structure[0] self.structure.pop(0) return temp
def BFS(self, n): if self.G.hasNode(graph.Node(n)): self.push(self.G.hasNode(graph.Node(n))) while self.structure != []: visiting = self.pop() print(self.G.hasNode(visiting), " was visited") self.visited[self.G.hasNode(visiting)] = 1 for i in self.G.childrenOf(visiting): if self.visited[self.G.hasNode(i.getDest())] == 1: continue else: self.push(self.G.hasNode(i.getDest())) else: print("Node not found")
import graph class Traversal(object): def __init__(self): self.G = graph.create() self.visited = <> self.structure = [] self.top = -1 for i in self.G.adjList.keys(): self.visited[i] = 0 def clear(self): self.visited = <> self.structure = [] for i in self.G.adjList.keys(): self.visited[i] = 0 def push(self, node): self.structure.append(node) self.top = self.top + 1 def pop(self): if self.structure == []: return False else: temp = self.structure[0] self.structure.pop(0) return temp def BFS(self, n): if self.G.hasNode(graph.Node(n)): self.push(self.G.hasNode(graph.Node(n))) while self.structure != []: visiting = self.pop() print(self.G.hasNode(visiting), " was visited") self.visited[self.G.hasNode(visiting)] = 1 for i in self.G.childrenOf(visiting): if self.visited[self.G.hasNode(i.getDest())] == 1: continue else: self.push(self.G.hasNode(i.getDest())) else: print("Node not found") t = Traversal() t.BFS(1)
Enter the number of nodes in the graph :: 5 Enter 1 to create Graph/ 2 to create DiGraph :: 1 Node added successfully 1 Node added successfully 2 Node added successfully 3 Node added successfully 4 Node added successfully 5 Enter the number of edges in the graph :: 4 Enter the source :: 1 Enter the destination :: 2 Edge added successfully Edge added successfully Enter the source :: 1 Enter the destination :: 3 Edge added successfully Edge added successfully Enter the source :: 3 Enter the destination :: 4 Edge added successfully Edge added successfully Enter the source :: 3 Enter the destination :: 5 Edge added successfully Edge added successfully 1 was visited 2 was visited 3 was visited 4 was visited 5 was visited
- Used to find the shortest path
- Always finds the optimal solution
- There are no useless paths in BFS since it searches level by level
- Finds the closest goal in the shortest time
Disadvantages of BFS
- It is a blind search
- When the search area is large, heuristic search algorithms would fare better
- It performs best in a small space
- It performs best if the search element is placed in the upper left-hand corner of the graph
Applications of BFS algorithm
- Implementation of Shortest Path and Spanning Tree for unweighted graph
- Peer to peer networks
- Crawlers in search engines
- Social networking websites
- GPS navigation systems
- Cycle detection in undirected graphs
- Ford-Fulkerson Algorithm
- To check if a graph is a bipartite or not