Turning Lemons into Topologically Sorted Lemonade

09 Apr 2021 ~ 4 minute read

In a recent interview, I was asked to pair on a coding problem. Like most live coding exercises, I didn’t do very well. So, in an effort to redeem myself (in the eyes of myself), I studied up on the problem and worked through several solutions.

Hopefully, you don’t find yourself in a similar situation. But, if you do, I hope reading through these solutions helps you fair better than I did!

Courses and prerequisites

Without further ado, the pairing exercise problem statement:

Given a set of courses and a corresponding set prerequisites, produce a valid ordering of courses such that the courses can be taken in that order without bypassing any of the prerequisites (there are multiple correct solutions).

And, using a Python dictionary, the input data:

COURSES = {
    "Algebra 1": [],
    "Algebra 2": ["Algebra 1"],
    "English 1": [],
    "English 2": ["English 1", "History 1"],
    "English 3": ["English 2"],
    "English 4": ["English 3"],
    "History 1": [],
    "History 2": ["History 1"],
    "Pre-Calculus": ["Algebra 2"],
    "Statistics 1": ["Algebra 1"],
    "Statistics 2": ["Statistics 1"],
}

Given the input above, a valid ordering of courses is:

['History 1',
 'History 2',
 'English 1',
 'English 2',
 'English 3',
 'English 4',
 'Algebra 1',
 'Statistics 1',
 'Statistics 2',
 'Algebra 2',
 'Pre-Calculus']

Now that we have the problem defined, let’s look at some solutions!

Sorting out the correct terminology

I had a sense that the solution to this problem involved modeling the data with a graph data structure, but I wasn’t sure what to do with it after that. So, I started looking for graph related libraries in Python, which led me to the most excellent NetworkX.

After navigating the NetworkX API documentation a bit, I noticed an entire section dedicated to algorithms. Under the algorithms section was a subsection specific to Directed Acyclic Graphs (DAGs). The reference to DAGs caught my eye because DAGs are often used to model data processing workflows with complex dependencies. In the problem statement above, the course prerequisites are a lot like data processing workflow dependencies.

Continuing through the DAG related algorithms, the description for topological_sort(G) stood out:

A topological sort is a nonunique permutation of the nodes such that an edge from u to v implies that u appears before v in the topological sort order.

That sounds promising! An edge can be produced by connecting a course v to a prerequisite u. If a topological sort can help ensure u appears before v in an ordering, then it aligns with our goal. Let’s give it a spin!

NetworkX to the rescue

Following the guidance in the topological_sort(G) description, I iterated over each combination of course and prerequisite and created an edge between them with the add_edge method:

>>> import networkx as nx
>>> graph = nx.DiGraph()
>>> for course, prerequisites in COURSES.items():
...     for prerequisite in prerequisites:
...         graph.add_edge(prerequisite, course)
... 
>>>

From there, the only thing left to do was to call topological_sort(G) with the DiGraph as an argument:

>>> pprint.pprint(list(nx.topological_sort(graph)))
['History 1',
 'History 2',
 'English 1',
 'English 2',
 'English 3',
 'English 4',
 'Algebra 1',
 'Statistics 1',
 'Statistics 2',
 'Algebra 2',
 'Pre-Calculus']
>>>

That ordering looks valid to me!

The standard library can do it too

After identifying the class of algorithm necessary to solve the problem (i.e., topological sort), I began to use the term in searches for other types of solutions. Eventually, that led me to a module in the Python standard library called graphlib.

According to the Python commit history, graphlib is pretty new (added in Python 3.9). It promises to provide a set of functionality for operating on graph-like structures. But, right now it only has one class worth of functionality. Luckily for us, that one class is called TopologicalSorter!

Instantiating the class takes an argument, graph, which:

…must be a dictionary representing a directed acyclic graph where the keys are nodes and the values are iterables of all predecessors of that node in the graph (the nodes that have edges that point to the value in the key).

Hm. That sounds very similar to the COURSES data structure defined above. Let’s pass it through in the Python interpreter, and then call the static_order method:

>>> from graphlib import TopologicalSorter
>>> pprint.pprint(list(TopologicalSorter(COURSES).static_order()))
['Algebra 1',
 'English 1',
 'History 1',
 'Algebra 2',
 'Statistics 1',
 'English 2',
 'History 2',
 'Pre-Calculus',
 'Statistics 2',
 'English 3',
 'English 4']
>>>

Whoa! A different ordering from the one above, but still valid. We can even use another NetworkX function to confirm the TopologicalSorter solution is valid by checking it against all possible topological sorts, as reported by all_topological_sorts(G):

>>> solution = list(TopologicalSorter(COURSES).static_order())
>>> solution in nx.all_topological_sorts(graph)
True
>>>

Excellent—looks like we are two-for-two so far!

Show your work

Using algorithms built-in to libraries like NetworkX and graphlib is fun and all, but how would we solve this problem algorithmically? Well, according to Wikipedia, there are two existing algorithms to draw from:

I’m going to focus on Kahn’s algorithm—primarily because it doesn’t involve recursion. Although recursion is a fascinating technique, I don’t see it too often in day-to-day code, and I find that it creates confusion in most engineers (myself included).

Note: If the variable names below become confusing, please refer to the pseudocode in the Wikipedia link for Kahn’s algorithm. I’m trying to match the variable names to that, so it is easier to follow along.

To start things off, let’s use the Python interpreter to define L and S. Here, L is being set up to contain the final course ordering and S made up of all the nodes in the graph with zero edges pointing to them (e.g., in_degree == 0):

>>> L = []
>>> S = [node for node in graph if not graph.in_degree(node)]
>>>

Next, we need to create a while loop set to run until S is empty. Inside, we pop n from S and immediately append it to L:

>>> while S:
...     n = S.pop()
...     L.append(n)

After that, we need to identify all nodes connected to n so that we can remove each edge from the graph, one-by-one. As they’re removed, we check to see if there are any remaining edges pointing to the node m. If not, append m to S.

...     edges = list(graph.neighbors(n))
...     for m in edges:
...             graph.remove_edge(n, m)
...             if not graph.in_degree(m):
...                     S.append(m)
>>>

After these steps are complete, L should contain a valid course ordering. Again, we can confirm with all_topological_sorts(G):

>>> L in nx.all_topological_sorts(graph)
True
>>>

That’s it! Now we have three different solutions to determine a valid ordering for a set of courses and prerequisites. Enjoy all the flavors of topologically sorted lemonade!