XLAT, the most specific instruction – Arcane x86 (#1)

The x86 ISA is very flexible in terms of addressing modes. To form an address you can add two registers together (one of them optionally multiplied by 2, 4 or 8), and then add an offset! And (nearly) every subset of these features can be used also.

For instance, these are all valid addressing modes:

mov al, [0x12345678]
mov edx, [eax]
mov cx, [eax + ebx]
mov esi, [ebx * 4]
mov eax, [ebx + ecx * 2]
mov dh, [0xABCDEF + eax + edx * 8]

These are all handled by the mov instruction! Given the flexibility of mov, there are a number of different encodings for it, and they can be quite complex in order to store all of the necessary information. For instance, assembling the above instructions gives the following:

a0 78 56 34 12
8b 10
66 8b 0c 18
8b 34 9d 00 00 00 00 
8b 04 4b
8a b4 d0 ef cd ab 00

In all the examples above, notice how only the 32 bit registers can be used in the address generation? It is possible to use all 16 bit registers instead (with the address size prefix 0x67 before the instruction). But what about 8 bit registers?

Nope!

You cannot use an 8 bit register to form an address in x86.

Therefore, constructions like these are impossible:

mov bl, [ebx + al]
mov eax, [al]
mov al, [dl + bh]
mov esi, [0x123456 + ch]
mov cx, [eax + dl + 0x123]

There is one very specific exception. There is a single combination of registers that permit addressing with an 8 bit register. It would be equivalent to:

mov al, [ebx + al]

This particular instruction, with this exact choice of both source and destination registers and sizes is enough to warrant an exception – it gets its own dedicated instruction!

That instruction is of course, xlat. Notice that it doesn’t take any operands – because it is hardcoded to use ebx and al (for both input and output)

xlat            ; al = [ebx + al]

Okay then… but how is this encoded? Some super-complex obscure instruction encoding for such a niche situation?

Nope!

It gets its own dedicated single-byte opcode! The 0xD7 opcode the single byte instruction, xlat.

Yep, there are thousands of x86 instructions, all crammed into an opcode space that ran out of real estate decades ago, and we have this goofball xlat squatting on one of the coveted single byte opcodes.

When AMD introduced the AMD64 (or x86-64) instruction set, the 64-bit version of x86, they needed to clear up some space on the opcode map to make room for the new 64-bit prefixes. You would think, given that xlat has not been used by anyone this millennium, that it would be on the chopping block to free up some room.

Nope! (But the single byte register increment/decrement instructions were lost!)

xlat is still here in x86-64, and still takes up an entire opcode in the map!

Crikey.

Does xlat have any uses? It does after all encode something that cannot be encoded by any other instruction (having a 8 bit memory operand)! Surely if for some reason you really needed to do this very specific operation, then it’d be faster than doing it the long way? Right? Right?

You would assume that a little one byte instruction would beat out a sequence of instructions to replicate its 8 bit addressing.

Source: https://www.acepace.net/images/xlatb/meme.jpg

Ace Pace (https://www.acepace.net/2019-07-27-xlatb/) benchmarked this, and found that (despite the meme) – xlat was slower than the equivalent 4 byte sequence – which also involved a push and pop (tripling the number of memory accesses!)

Because it’s not a commonly used instruction, xlat isn’t optimised like normal instructions are. And because it isn’t optimised, compilers don’t use it! What a terrible negative spiral. If you can get any compiler to spit out an xlat, please let me know – but I doubt any will.

The next question is why xlat was added to the ISA at all. And it makes more sense looking back to 1978 when the original 8086 was released. This was way back in the bad old days, when it was a 16 bit processor. Unlike the newer 32 bit and 64 bit modes, 16 bit mode addressing is much less flexible and more limited. The only possible addressing modes are:

[bx + offset]
[si + offset]
[di + offset]
[bp + offset]
[bx + si + offset]
[bx + di + offset]
[bp + si + offset]
[bp + di + offset]
[address]

The only modes that allow two registers to be added together (e.g. base register plus offset register), involve bx or bp, and si or di.

The full name of xlat – translation – gives a clue into how it might be used. It was designed to be used for translating one value to another, through the use of lookup tables. The bx register would contain the address of the table, and al would be the index into the table. This would be useful, as al would often contain character data (and for instance, some instructions like lodsb implicitly use al as a destination for data). You could then quickly translate this from one format to another (e.g. ASCII to EBCDIC), in place, by just executing an xlat.

lookup_table db "..."

read_and_convert_string:
    mov bx, lookup_table
.next:
    lodsb     
    xlat
    cmp al, 0
    je short .exit
    ; do something with `al`
    jmp short .next
.end:
    ret

And well, I think that’s all. I just had to give this instruction some love, because the CPUs of the world sure aren’t.

The Low Level Blog

XLAT, the most specific instruction – Arcane x86 (#1)

Leave a Reply Cancel reply