(A brief note (2024-02-12): I originally wrote this while thinking about how to compile APL—a simple and very high-level language which is relatively easy to reason about mechanically and optimise, following some discussion with the maintainer of Futhark, which is similar. Since then, I have become increasingly interested in more general approaches that can do sophisticated things to more expressive languages in which the space of useful programs is less tightly constrained. I never implemented these approaches into my J compiler (J is a dialect of APL). Although I still think they are a good idea for APL, I am dubious of their applicability to more generic approaches and more expressive languages. However, I remain utterly convinced of both the feasibility and the necessity of highly performant implementation strategies for sound semantics.)
Recently (that is, 10-15 years ago), there was some kerfuffle regarding a change to the C and C++ programming languages. The form of the change was slightly different for each language,0 but the motivation and effect were the same: it was considered desirable that a compiler be able to eliminate side-effect-free loops with no data dependents, even if those loops were potentially non-terminating.
For my part, I find this very antisocial behaviour on the part of a compiler,1 and would seek to disallow it. However, I am still interested in optimisability. How is a compiler to (soundly) reason about such loops?
One option, quite boring, is to consider potential non-termination a side effect. That would be like putting up a great blinder in front of every loop until or unless we can prove whether it terminates; probably crudely effective, but we can do better. In order to create a combining analysis that fits into a monotone analysis framework, we’ll associate with each of our terms a halting type telling whether that term halts or not. This type belongs to a straightforward lattice which will start off with just four elements: halts(⊥) (an identity that means whatever we want it to); halts(true) (does halt); halts(false) (doesn’t halt); and halts(⊤) (don’t know), in the obvious arrangement.
We’ll also introduce halting dependencies, which can exist between two terms, capturing relationships relating to nontermination.
With that in mind, let’s consider the following example (in a hypothetical variant of C or C++ with no forward progress guarantees):
i = 0; while (i != 1) i ^= 2; return i;
Here, i oscillates forever between 0 and 2, so the loop does not terminate. However, a conditional constant propagator can still replace ‘return i’ with ‘return 1’, because i will always be 1 when control reaches the ‘return’; this inference is valid because the antecedent is false: control never gets that far. Crucially, however, the inference is valid regardless of whether the loop terminates, so we can always perform it.
A C or C++ compiler might, at this point, remove the loop entirely, since it has no data dependents and performs no side effects. We don’t want to do that, since the loop may not terminate (it in fact doesn’t), but what else can we do?
Now that we’ve removed our data dependence on the loop—and we have no control dependence upon it, since neither 1 nor the loop is effectful—we need some way to keep it around. To that end, I introduce the collation operator (coll, for short). The collation operator is related to the sequencing operator (written , or ; in C), but is distinct from it; the latter ought not to make it into our intermediate representation, but should instead be represented more directly using control dependencies. E.G., if we have effect,effect, the latter effect should have a control dependence on the former, because it precedes it in program order; no sequencing operator appears directly in the IR.
The collation operator does appear in the IR, however; if x,y appears in our source, then in our IR:
When x is our potentially-nonterminating loop, and y is i, all uses of which go through coll, and coll has a halting dependence on x, we cannot eliminate x (which is what we want). But when can we eliminate x? With a little thought, we can work out some straightforward axioms for collation and halting. Collation is associative and commutative,2 so w.l.o.g. I’ll consider only a couple of cases:
p = coll(x: halts(true), y: halts(c1)) ⇒ p: halts(c1)
That is, if any dependency is known to halt, then the collation halts if and only if the other does. This allows us to sever the halting dependence on the first dependency (though not the data dependence, if it exists). Similarly:
p = coll(x: halts(false), y: halts(c1)) ⇒ p: halts(false)
That is, if either dependency does not halt, the collation doesn’t either. In fact, we can combine both of these into just one rule, to wit:
p = coll(x: halts(c0), y: halts(c1)) ⇒ p: halts(c0 ∧ c1)
Which makes it obvious why coll is associative and commutative: it’s just conjunction. (More on this in a bit.) Again, we have that we can sever any halting dependence on a term which is guaranteed to halt (but not one which is guaranteed not to halt).
Furthermore, all these axioms apply just the same to most ordinary operations, like +. Consider:
p = x: value(0) + y: value(v1) ⇒ p: value(v1)
That is, if you add something with a value of zero to anything, the value of the result is that thing. We can therefore sever the data dependence of + on x, but not the halting dependence, until we prove that x halts. coll is therefore a boring right identity function; it’s otherwise completely uniform.
We of course have the classic:
p: halts(false) ⇒ p: value(⊥)
Which allows us to sever the data dependence of a collation, when one of its halting dependencies is known not to terminate.
Conditional execution can be handled with aplomb:
p = if q then r: halts(c0) else s: halts(c1) ⇒ p: halts((q ∧ c0) ∨ (¬q ∧ c1))
Hence, dead code elimination is possible: a branch which is never taken but which would not halt if it were can be safely gotten rid of without affecting anything else.
Any side effect must have a control dependence on all potentially nonterminating terms which precede it in program order, and any potentially nonterminating term must have a control dependence on all side effects which precede it in program order. But the potentially nonterminating terms do not need to depend on each other; notionally, if, in between execution of two side effects, nontermination is encountered, it must be preserved, but it does not matter which nontermination it is. This is why halting dependencies express weaker constraints than control dependencies; instances of nontermination are not totally ordered, unlike side effects.
I am on shakier ground, here, as I am not yet quite sure how dependent types fit into this sort of analysis framework. However, they certainly seem like a useful addition, both in this particular case and generally. My use of ∧ and ∨ above reflects this, where they can be interpreted as representing ordinary boolean conjunction and disjunction (the presence of ⊥ and ⊤ notwithstanding) rather than meet and join (this is particularly clearly manifest in the axiom for conditionals).
If some term p has no data dependents and performs no side effects, and we can infer some dependent term c0 such that p: halts(c0), then we can replace p with something like:
if ¬c0 then infinite-loop
In other words, if we can suss out the conditions under which an otherwise dead term does not terminate, then we can simply check if those conditions hold and avoid terminating if so, avoiding doing any of the other work necessary to realise the term. For instance, suppose we perform a mapping over a linked list with a pure operator that halts for all inputs, and we know that all the list elements are in the domain of that operator; then, the mapping will halt if and only if the list is not circular. When the mapping has no data dependents, we can transform this into a simple traversal of the list, never invoking the mapping operator.
Another application is common subexpression elimination. If we have x: halts(c0) and y: halts(c1), and neither of x and y has a control dependency upon the other,3 and we manage to assign the same value number to c0 and c1, then x and y are equivalent, so we can avoid emitting code for both.
A slightly more interesting case occurs when one of c0 and c1 implies the other (for instance, suppose c0 were a < b and c1 were a ≤ b). I am not yet sure of the best way to handle this, but an obvious and useful observation is that, if p = coll(x: halts(a < b), y: halts(a ≤ b)), then p: halts(a < b ∧ a ≤ b) = halts(a < b).