I wrote some notes for anyone implementing Breadth First Search as described on my Tower Defense Pathfinding page. However, these notes are out of date. See this code instead for both Python and C++ versions.
Internal state#
The article uses external state for the reached/came_from/distance sets and hash tables. An alternative is to use internal state, where the data is kept inside the graph node data structure. Why use internal vs external data? Although internal is slightly faster (avoids the hash table), external is cleaner, by not modifying the graph data structure when you search. It also supports multiple simultaneous searches, whether multithreaded or with coroutines. Here’s an example node with an internal reached flag:
class Node: def __init__(self, name): self.reached = False self.name = name self._neighbors = [] def neighbors(self): return self._neighbors
And here’s a sample graph:
A = Node("A") B = Node("B") C = Node("C") D = Node("D") E = Node("E") A._neighbors = [B] B._neighbors = [A, C] C._neighbors = [D] D._neighbors = [E] E._neighbors = [] start = A
If we’re using internal state, we have to reset reached
back to False in order to run the algorithm again. We can do that before the algorithm runs:
ALL_NODES = [A, B, C, D, E] frontier = Queue() frontier.put(start) for node in ALL_NODES: node.reached = False start.reached = True while not frontier.empty(): current = frontier.get() callback(current) for next in current.neighbors(): if not next.reached: next.reached = True frontier.put(next)
Or we can reset after the algorithm runs, by keeping a list of all nodes we’ve reached:
frontier = Queue() frontier.put(start) start.reached = True reached_nodes = [start] while not frontier.empty(): current = frontier.get() for next in current.neighbors(): if not next.reached: next.reached = True reached_nodes.append(next) frontier.put(next) for node in reached_nodes: node.reached = False
Why choose one or the other? It doesn’t matter much here because we visit all nodes. If you’re visiting all nodes, the first approach is slightly faster. However, if you’re using early exit then you’re not visiting all nodes, and the second approach can be much faster. The first way has to go through all nodes every time. The second way only goes through the nodes that were reached.
Node IDs
(Note: most of you won’t need this optimization.) For the hash table approaches to work, your nodes need to be hashable. An alternative is to give each node an integer ID. You can then use a bit array to store reached
, and a regular array of ints to store distance
and came_from
.
If you’re using C or C++, you can even get away with leaving the distance and came_from arrays uninitialized (potentially a big win!). You only have to initialize the bit array, which squeezes 64 ids into a single (long) int. Only if the bit is set does the value in the distance or came_from array get initialized. You can either put the distance/came_from arrays on the stack, if you have enough stack space, or you can use statically allocated arrays that aren’t re-initialized after each search. Be careful though to weigh the cost of initializing the reached bit array versus the cost of using a hash table. If the fraction of nodes reached in each search is small, it may be better to use the hash table.