Disassembling X64
I’ve been writing a little offline “compiler explorer” for my codebase, where I can quickly go to a specific function and inspect its disassembly. I just do it by opening the executable file, find the corresponding PDB file (this information can be found in the executable header). Open the pdb, match it with the executable, as both of them have a GUID and a version. Then get all the informations that match lines to assembly.
Once we have that, you have a list of bytes from the executable that match a line from a source file (kind of, a little more complicated in optimized builds). But to display anything in human readable format, you have to disassemble the byte sequence into instructions, like the front end of the cpu does.
You can find some libraries that does this (like capstone), but I wanted to go on the learning journey of disassembling the full X64 spec (as I already did it for some 8086 assembly, so I had some idea on what I was getting into).
Recap Of Basic Disassembly
We are given a bytestream (array of bytes aka u8* for C people) and the goal is to match those bytes to instructions. X64 has different sizes for each of its instructions based on several factors, so this is a sequential problem meaning you need to disassemble A before even knowing where the next instruction starts (event thought, hardware people do decode multiple instructions per clock int the front end of your CPU).
Instruction encoding is between 1 and 15 bytes, and here is the basic layout of an instruction:
+-----------------+ +--------+ +------------++------------+ +----------------+
| Optional Prefix | | Opcode | | Mod/RM/Reg ||Optional SIB| | Optional Data |
+-----------------+ +--------+ +------------++------------+ +----------------+
Let’s take a quick look at each part :
Prefix : in x86 prefix were mainly used to alter the behaviour of an instruction like forcing it atomic( 0xF0 ), or adjusting the width of the impacted register ( 0x66 ). x64 add the Register Extension Prefix aka REX. SSE instructions added mandatory prefix for its instructions reusing old x86 prefix (0x0F, 0x66, 0xF2 and 0xF3). AVX added another prefix, the Vector Extension (VEX).
Opcode: the opcode is a fixed series of bytes, usually between one and three, that are used to identify the correct instruction. Be aware that opcode is not the only thing you’ll need to differentiate between instructions, some of them share an opcode.
Mod/RM/Reg: This byte is used to identify what are the targets of the instruction, either registers or memory.
7 0 +---+---+---+---+---+---+---+---+ | mod | reg | rm | +---+---+---+---+---+---+---+---+
The 2 MOD bits tells us if there are optional data or not, and the size of it ( 1 byte or 4 bytes). Reg and RM are 3 bits fields that gives us a register name, and combined with the mod, we do know if this a simple register or a memory location based on a register.
Scale Index Byte: This is an addition in the spec to allow complex addressing.
This allows more complex memory offseting expressions in the form of :
base + scale * index
. Where index and base are registers, and scale is the power of two we will multiply index by.7 0 +---+---+---+---+---+---+---+---+ | scale | index | base | +---+---+---+---+---+---+---+---+
Optionnal Data: Each instruction can then have data directly embedded in the bystream. You’ll have to look at each instruction in the Intel Manual to know if it needs an immediate or not.
Let’s just have a quick look on how Intel list the instructions in its instruction manual.
Intel Manual Guide Example:
The first part 0B is the operation Code, and the /r tells us its followed by a MOD/RM/REG byte. So we know this instruction is two bytes long.
The next part tells us it’s a register to register OR, and the third column is an index into the instruction operand encoding. RM means the first register is encoded in the REG bits of the MOD/REG/RM byte, and the second in the RM field.
This second instruction is also an OR, with a 0x81 opcode. It is an or of a register defined in the RM field with some immediate 32bits value folowwinf the mod/REG/RM byte. So this is a 6 byte instruction. There is some catch here in the /1 value. This means that the reg field HAS to be 1 to match this instruction. A different REG value will lead to a different instruction (like here 0 will be ADD or will be CMP). So the REG field is sometimes used to differentiate between instructions.
Same Opcode Instructions
Some instructions share the same opcodes and have to be differentiated another way. Let’s take ADD and SUB as an example.
As you can see, they both share the 81 opcode, and the distinction has to be done with the byte after that with the content of the REG field. For the instruction to match the ADD its REG must be 0 (which is noted as \0) and for it to match the SUB it must be 5 ( which is noted as /5).
Specific Prefixes
REX
Here its pattern :
7 0
+---+---+---+---+---+---+---+---+
| 0 1 0 0 | W | R | X | B |
+---+---+---+---+---+---+---+---+
The top four bits are here for pattern matching this prefix. Then we can extract four values :
W : tells us if the following opcode is 64 bits or not, meaning do the register encoded are the 64 bits wide instead of the 32 bits.
R: is to extend registers (to be able to acess the new R8 -R15 registers added in x64). If this bit is set, then instead of your old table, the reg or rm value defines a register from R8 to R15.
X and B : Are bits to extend the registersas the R does but for the SIB Byte. X to extend the index register, and B to extend the base register.
This has to be the last prefix, meaning the opcode must follow.
SSE
SSE instructions does not add new prefix to read, but they all start with the combination of some prefix. First either : 0xF2, 0xF3, 0x66 or none. They are then folowwed by a sequence of bytes that is either : 0X0F, 0X0F 0x38 or 0x0F 0x3A.Add an opcode after that and we have an SSE instruction. Here what it looks like for the pshufb instruction :
AVX
Avx adds some new prefix to decode the VEX byte, and it comes in 2 flavours, 2 bytes VEX and 3 bytes VEX. The tricky bits for this is that some prefix are necessary based on the intel instruction, but it is not directly in the bytestream like the others, there are encoded in the VEX register.
Two Bytes :
7 0 7 0
+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+
| 1 1 0 0 0 1 0 1 | |~R | ~vvvv | L | pp |
+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+
VVVV : is a way to encode a new register, because some AVX instructions now use 3 registers. It is inverted, so you’ll need to flip each bits. It is named VEX.vvvv in the operand encoding table like you see for line B here (EVEX is for AVX-512 which is not mentionned here):
L : Is similar to the W in the REX field, if it is set we are in 256 bits, else 128 bits.
PP: is a way to encode SSE prefix in a smaller format, using 2 bits. Here its possible values as cited in the Intel manual :
R: Similar to REX R bit, but inverted.
Three Bytes :
7 0 7 0
+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+
| 1 1 0 0 0 1 0 0 | |~R |~X |~B | m-mmmm |
+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+
7 0
+---+---+---+---+---+---+---+---+
|W/E| ~vvvv | L | pp |
+---+---+---+---+---+---+---+---+
The new m-mmmm
field in the second byte encode the SSE prefix with 3 possible
values:
X and B are inverted version of the same value in the REX byte. W/E is for size of integer instructions.
In the instruction manual, we often see instructions like :
The difference here lies in the VEX.128 or VEX.256, which is just to tell if the L bit is set or not.
Final Thoughts
This is all the trickery that you must know to encode x64 instructions. The next part is just going through all instructions and do the work. This was just a quick overview of everything, as it is not very hard to find information on this subject, but some finicky things, like the correspondance between some intel syntax and the actual bits to check was not always straightforward to find.
Combining that with the C++ parser I have , here a little demo of what it can achieve.
This allow me to go inspect a function when optimizing directly in the engine, without having to specifically go in this function execution in the debugger to be able to see its disassembly.
Thank you for reading,
Guillaume