Originally published on February 13th, 2008 on OpenRCE.
In this post, we shall investigate some strange-looking code generated in the context of an array index.
.text:10002D49 mov eax, [esp+arg_0] .text:10002D4D lea ecx, [eax-9C40h] .text:10002D53 cmp ecx, 50h .text:10002D56 ja short loc_10002D60 .text:10002D58 mov eax, dword ptr ds:(loc_1000EF5B+1)[eax*8] .text:10002D5F retn .text:10002D60 .text:10002D60 loc_10002D60: .text:10002D60 lea edx, [eax-0A029h] .text:10002D66 cmp edx, 9 .text:10002D69 ja short loc_10002D73 .text:10002D6B mov eax, dword ptr ds:loc_1000D344[eax*8] .text:10002D72 retn
We don't find any arrays at the locations referenced on lines -D58 and -D6B (in fact we find code) which is unusual:
; First target .text:1000EF57 movzx eax, word ptr [esi+18h] .text:1000EF5B loc_1000EF5B: ; DATA XREF: 10002D58 .text:1000EF5B add dword_10065280, eax .text:1000EF61 xor eax, eax .text:1000EF63 pop esi .text:1000EF64 mov esp, ebp .text:1000EF66 pop ebp ; Second target .text:1000D342 mov esp, ebp .text:1000D344 loc_1000D344: ; DATA XREF: 10002D6B .text:1000D344 pop ebp
Looking closer at the code, the trick lies in the fact that the arrays are not being indexed starting at zero.
.text:10002D58 mov eax, dword ptr ds:(loc_1000EF5B+1)[eax*8] ; <- 0x9C40 <= eax < 0x9C90 .text:10002D6B mov eax, dword ptr ds:loc_1000D344[eax*8] ; <- 0xA029 <= eax < 0xA032
So the first array actually begins at 0x1000EF5B+1+0x9C40*8 == 0x1005D15C, and the second array begins at 0x1000D344+0x0A029*8 == 0x1005D48C. What happened here is that the pointer expression has been simplified to conform to x86's instruction encoding:
[1005D15Ch + (eax - 0x9C40) * 8] => [1005D15Ch - 4E200h + eax*8] => [1000EF5Ch + eax*8]
This is pretty uncommon; I've only seen it a handful of times in my reversing endeavors over the years.