Nonblocking cycle detection and iterator invalidation

2023-11-29

Suppose we are traversing a linked list, and we want to guarantee forward progress. In a language with mutation, the naive algorithm can fail to terminate, for a list with cycles. We can fix this by employing a cycle detection algorithm.

The classic algorithm, used by the SICL implementation of Common Lisp, is the tortoise-and-hare algorithm. We track two nodes of the list, rather than just one: the hare, which advances on every iteration, and the tortoise, which advances only every other iteration. Since the hare is advancing faster than the tortoise, it should always stay ahead (after the first iteration); and when the hare reaches the end of the list, the iteration terminates. But if there is a cycle, then the hare will loop around and get ‘behind’ the tortoise, and then overtake it. So we simply have to check, on every iteration, if the hare is the same as the tortoise; if it is, then there must be a cycle.

This is well and good, but it is not robust in the face of concurrency (this needn’t refer solely to multithreading—for example, if we map a user-defined function over a linked list, then the invocations of the user’s function happen concurrently with the iteration—this is related to ‘iterator invalidation’). To see why, suppose there is a cycle, but neither the tortoise nor the hare has reached it yet. Then, suppose that one of the links ahead of the tortoise and behind the hare is changed to point to a node behind the tortoise, creating another cycle. Now, the tortoise and the hare end up on different cycles—they never meet, and so the iteration does not terminate.

The situation here is analogous to deadlock. We can also imagine a malfeasance analogous to livelock, or starvation (the distinction is not obvious, since we do not know the role of the other code running concurrently—since it is presumably buggy—so I will not bother to distinguish the two in this case; hence, I will also not distinguish obstruction freedom from lock freedom): let there be no cycle, but let new nodes be concurrently and repeatedly added to the end of the list.

This is not limited by resource exhaustion, since earlier nodes in the list can be garbage collected (trying to inhibit this is simply to leak and there is no good reason to do it), or simply reused directly (nodes can be unlinked from earlier sections of the list and then added to the tail). I cannot see a good way to avoid this form of livelock/starvation, so I will not bother to try; I want a local progress guarantee analogous to obstruction freedom or lock freedom (we might call it byzantine fault-tolerant obstruction freedom): no matter what other threads do, so long as we get to run uninterrupted for some finite span of time (a reasonable linear bound), we are guaranteed to either terminate successfully or produce a proof that there was a cycle.

My approach is a trivial refinement of the tortoise-and-hare algorithm. In order to protect against the eventuality in which the tortoise gets stuck in a different cycle from the tortoise, we must make the tortoise periodically catch up to the hare: teleport it to the node immediately behind the hare. Of course, if the hare and the tortoise are caught in the same large cycle, then making the tortoise catch up to the hare too frequently will prevent the hare from overtaking the tortoise from behind. Therefore, use an exponential backoff: let the tortoise catch up to the hare on the first iteration, the second, the fourth, the eighth, and so on. (In practice, it’s probably better to make the initial backoff much larger—perhaps 512 or so—to avoid the penalty from branch mispredictions when iterating over short lists; and a slightly smaller exponent might be preferable.) This takes takes only a constant factor’s extra time to detect any cycle, and it adds nothing to the critical path.

A rather silly paper claims that all nonblocking algorithms are wait-free in practice under a ‘stochastic scheduler’. The stochastic scheduler is a terrible model—I plan to deconstruct their claims and in particular their empirical justification (which is conspicuously missing from the published version of the paper) more thoroughly another time—in fact, schedulers behave much more like they operated in lockstep with bounded degree of nondeterminism. The limiting case of this is the example I gave above, where a mapped function runs concurrently but in perfect deterministic lockstep with the iterater. I believe that, in this case, there is no way in general to guarantee forward progress, short of eagerly copying the entire list (which is expensive). However, the case of an adversarial user-provided function seems a priori sufficiently local as to make this not overly concerning (the user’s function could simply contain an infinite loop, which would obviously inhibit progress).

In general, nonblocking data structures are composable, but only up to a layer of abstraction: if we provide high-level iteration functions but at the same time low-level access to the nodes of a list, then it is the responsibility of the code performing the low-level accesses to abide by the invariants of the higher-level abstractions if it would like to interoperate with them. The goal here is simply to take a best-effort mechanism for fairly low-overhead runtime diagnosis of certain bugs and make it better-effort. Also, because the single-threaded case is deterministic, associated bugs are deterministically reproducible, so it is less important to provide runtime mechanisms to detect and diagnose them.