One popular family of strategies for representing pointers0 in dynamically-typed languages is what I will refer to loosely as low-bits tagging schemes. In these schemes, a pointer is contained by a word, the low bits of which are metadata describing the upper bits. (If you’re unfamiliar, Max Bernstein exposits a representative such scheme here.)
One advantage of these schemes is that they admit efficient access to the object denoted by the pointer. Many basic operations can be performed directly on the tagged representation, rather than needing to separate data from metadata before operating on the former. This applies at least when the data are integers or memory addresses. What about floating-point numbers?
Double-precision floating-point numbers take up a whole word, on 64-bit architectures; that is no good.1 But single-precision floating-point numbers take up only half a word, which is quite manageable. Standard is to store the floats in the high 32 bits of the pointer, with the low 32 bits as a tag.2 Unlike with integers and memory addresses, it is not generally possible to operate on these directly, without untagging them first. Hence, suppose we want to compile a function which adds its two arguments, and requires that they be floating-point numbers; we might end up with assembly like the following (error-checking elided for brevity):
shr rdi, 32 ; shift off the tag bits of the first addend shr rsi, 32 ; shift off the tag bits of the second addend movd xmm0, edi ; move the first addend from a general-purpose register into a floating-point register movd xmm1, esi ; ditto addss xmm0, xmm1 ; perform the actual addition movd eax, xmm0 ; move the sum back into a general-purpose register shl rax, 32 ; shift the sum into the high bits of the general-purpose register or rax, SFLOAT_TAG ; tag the sum
This involves a lot of pointless faffery only for the sake of the tags. My insight is that we can elide a lot of this faffery by taking advantage of a couple of qualities near-ubiquitous on modern processors and architectures. Generally, the floating-point registers coincide with the vector registers; and furthermore, vector operations may have identical or near-identical performance to scalar operations, especially on floating-point numbers. Also, any floating-point operation performed on a NaN will give back the same NaN. Therefore, if we can make the bit pattern of the low 32 bits of a tagged single-precision float coincide with a single-precision NaN, we can perform a vector floating-point operation between any two tagged floating-point numbers, treating each as a 2-vector of floats; in the result, the high element of the vector—that is the high 32 bits—will be our desired result, while the low element—that is the low 32 bits—will be a NaN, carried through from the inputs, which also means that it is also a valid tag. The result is that we no longer need to explicitly tag or untag any float value. Here’s addition again, with this new scheme:
movq xmm0, rdi ; move our first addend into a floating-point vector register ; now the low element of the vector is the low 32 bits of our tagged input, which happen to be a NaN ; and the following element is the high 32 bits of our tagged input, which are the number it denotes ; (the register has room for two more elements; we don’t care about their values, but they happen to have been zeroed) movq xmm1, rsi ; ditto the second addend addps xmm0, xmm1 ; perform a vector addition ; the low element of the result is xmm00 + xmm10 = NaN + NaN = NaN ; the next element is xmm01 + xmm11, which is the sum we’re interested in ; (the next two elements, again, we don’t care about; they happen to be zeroes) movq rax, xmm0 ; copy the result back out into a general-purpose register ; the low 32 bits are the first element of the vector, which are a NaN, which is our tag for a single float ; the next 32 bits are the second element of the vector, which is the sum we’re interested in.
It’s hard to imagine a more performant way of dealing with tagged floats!
How to compare numbers represented thus? On most architectures, scalar floating-point comparisons apply only to the low element of a given vector, whereas our numbers are in the second element of their vector. Not to worry: just interpret the low two elements of the vector as a scalar double-precision float and compare. For instance, the following x86 function snippet adds its first two arguments and compares the sum to the third argument:
movq xmm0, rdi movq xmm1, rsi movq xmm2, rdx addps xmm0, xmm1 comisd xmm0, xmm2
I think this is—not the smartest, but certainly the cleverest snippet of code I have ever written.
This does not handle NaNs, or comparisons between zeroes of opposite sign, but is otherwise sound (and comparisons including infinities are consistent).
(Other tricks are possible; for example, perform a vector subtract, and test the sign bit; or, if one of the comparands is a product, transform it into an fmsub and test the sign of the result (vtestpd selfie). But neither of these is faster than the approach given above, and both have more edge cases. If strict interpretation is required, and type inference does not rule out undesirable cases, fastest and best is simply to shuffle the vectors in question and perform a scalar single-precision compare; or, do a vector compare and branch on the result. In any case, adoption of my proposed tagging scheme should never harm performance, for it affords an implementation strictly more flexibility in code generation than the alternative.)
What NaN to choose for the tag? It should probably be a quiet NaN; those are denoted by the highest mantissa bit, so you still get all the low bits to play with. Beyond that, any considerations will likely be architecture-specific; I will offer a few for x86:
On x86, there are, I think, basically two interesting classes of NaNs.
The first is the positive NaNs. These are advantagoues compared with negative NaNs because they allow for a slightly faster tagging sequence, given an untagged single float. Something like the following:
movd eax, xmm0 shl rax, 32 or rax, SFLOAT_TAG
This sequence only works if SFLOAT_TAG is a positive float. If it were a negative float then, interpreted as a sign-extended 32-bit integer, it would also be a negative integer; but in this case it is sign-extended to a full 64 bits, so it would wipe out the high 32 bits.
However, I suspect that choosing a negative NaN is preferable. To see why, consider the type-checking sequence; suppose we have a tagged pointer in rax, and would like to check if it a single float. We might do something like the following:
cmp eax, SFLOAT_TAG jne fail
If SFLOAT_TAG is negative, and furthermore has most of the high bits of its mantissa set to 1, then it may be representable as a (sign-extended) 8-bit integer. (Whereas, a positive NaN will never be representable as a sign-extended 8-bit integer.) x86 has a compact encoding for 8-bit immediates; the code size improvements from using the smaller immediate format are likely substantial. In particular, type checks should outnumber tagging of untagged floats by a significant margin, since there is just not much reason to work with untagged floats under this scheme.