Your ABI is Probably Wrong

2021-05-07

An ABI, or Application Binary Interface, defines a way for binaries to talk to each other on a given platform, and it includes (among other things) a calling convention. Most (all?) ABIs have a design flaw that harms performance.

Let’s start by looking at the System V ABI, for x86-style CPUs. The ABI classifies function arguments into a number of different categories; we’re just going to consider two:

INTEGER: This class consists of integral types that ﬁt into one of the general purpose registers

MEMORY: This class consists of types that will be passed and returned in memory via the stack

I won’t go finely over the rules that classify arguments; suffice to say that, in a general sense:

Integers, pointers, and small structures have class INTEGER, and are passed in registers.
If a structure is too big, it has class MEMORY and is passed on the stack.
If there are too many arguments, the ones that don’t fit in registers will be passed on the stack.

In other words, passing large structures by value entails large copies and This Makes Me Sad.

Well, what’s wrong with that? Surely we can just do what we did in the days of dumb compilers and pass structures by pointer. Unfortunately, that doesn’t work anymore; compilers are smart now, and they don’t like it when objects alias.

For example:

void foo(int*);
void bar(void);

int x = 5;
foo(&x);    // for all we know, foo could have stored &x in a shared location
x = 7;
bar();      // through which bar could modify x
return x;   // meaning that this needs to turn into an actual load; it cannot be constant-folded

	    // (This would not have happened if x had been passed by value, but this is not viable for large structs, as we have seen.)

Restrict to the rescue! If foo’s parameter had been annotated with restrict, foo would not be allowed to alias it (C11§6.7.3.1p4,11). Unfortunately, compilers do not seem to generally be aware of this fact. More to the point, because there is no type-level enforcement of restrict in C, the veracity of the attribute cannot be counted upon in a general sense, even though it can be correct in cases when C’s ABI is used to communicate between languages with stronger type systems.

And really, the ABI should do the right thing by default. void foo(struct bla) is much easier to read than void foo(const struct bla *restrict), not to mention it does a better job of conveying intent and actually provides a stronger semantic guarantee.

Well, that’s System V. How do other ABIs fare? Microsoft’s is similar, but it passes structs with a pointer:

Structs or unions of [not small] sizes are passed as a pointer to memory allocated by the caller.

This gains you a bit of flexibility (though it also probably confuses the memory renamer a bit), but it doesn’t solve the actual problem. The ‘memory allocated by the caller’ is owned by the callee, who can modify it at will, so the caller still needs to spuriously copy.

More ABIs! ARM (sorry, AAA arch 64):

If the argument type is a Composite Type that is larger than 16 bytes, then the argument is copied to memory allocated by the caller and the argument is replaced by a pointer to the copy. [emphasis mine]

RISC-V:

Aggregates larger than 2×XLEN bits [side note: why the hell are you talking about bits?] are passed by reference and are replaced in the argument list with the address

[...]

Arguments passed by reference may be modified by the callee.

PowerPC:

All [non-homogeneous] aggregates are passed in consecutive GPRs, in GPRs and in memory, or in memory

MIPS n32:

Structs, unions, or other composite types are treated as a sequence of doublewords, and are passed in integer or floating point registers as though they were simple scalar parameters to the extent that they fit, with any excess on the stack packed according to the normal memory layout of the object

All of these are repetitions of the same two mistakes.

A correctly-specified ABI should pass large structures by immutable reference, usually obviating the copy. In the event that a copy is needed, it will generally happen only once, in the callee, rather than needing to be repeated by every caller. The callee also has more flexibility, and can copy only those portions of the structure that are actually modified.

Take heed, future ABI-makers, lest the angel of neuralgia take you!

Update

2021-06-24

A number of people have pointed out that, under this scheme, an argument will sometimes have to be copied twice. (That is, in the case when it is aliased in the caller and mutated by the callee.) This is true, and I should have mentioned it. However, I retain my original position that this approach will drastically reduce the number of copies required in any real-world, non-contrived code using value structures.

There was also a very interesting proposal: provide every value-struct-accepting function with multiple entry points. This should minimize the number of copies required (assuming no interprocedural changes), but it also complicates the ABI considerably. (In particular, a lot of real-world code depends on function pointers having the same size as object pointers; for this approach to work, it would have to violate that assumption, make function pointers doubly indirect, or pessimize.) I think the complication may not be worth it in this case, but the approach is definitely worth exploring.

References: