Originally published on January 22nd, 2008 on OpenRCE
Here are some optimizations that I have seen MSVC apply to structure references. I wish I could give you the real names for these optimizations, but I can't find them in any of my compilers textbooks. I have a feeling that they're buried away somewhere inside of Randy Allen and Ken Kennedy's incredibly dense tome, "Optimizing Compilers for Modern Architectures". If anybody knows the real names for these transformations, please speak up.
#1: Let's say we are accessing multiple entries in a structure that's larger than 80h. Now as stated in the previous entry, each access to the members situated at >= 0x80 is going to require a dword in the instruction encoding if we generate the "naive" code. If we instead do:
lea esi, [esi+middle_of_structure_somewhere] ; ... mov eax, [esi-(middle_of_structure_somewhere - member_offset1)] mov ebx, [esi+(member_offset2 - middle_of_structure_somewhere)]
We can access more of the structure with the one-byte instruction encoding, if those subtracted quantities are bytes. The compiler chooses middle_of_structure_somewhere specifically to maximize the number of one-byte references. This is the same idea behind the "frame pointer delta" stack-frame optimization.
#2: Let's say we have a loop that accesses two arrays of structures inside of another structure, one array beginning at +1234h, the other beginning at +2234h. If we emit the "naive" code:
; ecx = loop induction variable imul ebx, ecx, sizeof(structure1) imul edx, ecx, sizeof(structure2) ; ... mov eax, [esi+1234h+ebx+offset_of_member1] mov edi, [esi+2234h+edx+offset_of_member2]
Then obviously both of these structure displacements are going to require a separate dword in the instruction encoding for 1234h+offset_of_member1 and 2234h+offset_of_member2. If we instead do:
lea esi, [esi+1234h] ; ... ; ecx = loop induction variable imul ebx, ecx, sizeof(structure1) imul edx, ecx, sizeof(structure2) ; ... mov eax, [esi+ebx+offset_of_member1] mov edi, [esi+1000h+edx+offset_of_member2]
Then if offset_of_member1 is a byte, it's only going to require a byte in the instruction encoding, thus saving three bytes per reference to the first structure (we can combine the previous optimization to place esi such that the number of one-byte references is maximized). Alternatively, if more members in the second structure are accessed than those in the first, we'll see:
lea esi, [esi+2234h] ; ... ; ecx = loop induction variable imul ebx, ecx, sizeof(structure1) imul edx, ecx, sizeof(structure2) ; ... mov eax, [esi+ebx+offset_of_member1-1000h] mov edi, [esi+edx+offset_of_member2]
Once again, the first optimization can also be applied here to choose the optimal placement for esi that maximizes the number of single-byte references. The multiplications given in the second optimization can also be optimized away into additions.