Hex-Rays, GetProcAddress, and Malware Analysis

This entry is about how to make the best use of IDA and Hex-Rays with regards to a common scenario in malware analysis, namely, dynamic lookup of APIs via GetProcAddress (and/or import resolution via hash). I have been tempted to write this blog entry several times; in fact, I uploaded the original code for this entry exactly one year ago today. The problem that the script solves is simple: given the name of an API function, retrieve the proper type signature from IDA's type libraries. This makes it easier for the analyst to apply the proper types to the decompilation, which massively aid in readability and presentability. No more manually looking up and copying/pasting API type definitions, or ignoring the problem due to its tedious solution; just get the information directly from the IDA SDK. Here is a link to the script. EDIT LATER: I decided to write a small GUI for the functionality. You can find the Hex-Rays GUI plugin here.

Background

Hex-Rays v7.4 introduced special handling for GetProcAddress. We can see the difference -- several of them, actually -- in the following two screenshots. The first comes from Hex-Rays 7.1:

HR71.png

The second comes from Hex-Rays 7.6:

HR76.png

Several new features are evident in the screenshots -- more aggressive variable mapping eliminating the first two lines, and automatic variable renaming changing the names of variables -- but the one this entry focuses on has to do with the type assigned to the return value of GetProcAddress. Hex-Rays v7.4+ draw upon IDA's type libraries to automatically resolve the name of the procedure to its proper function pointer type signature, and set the return type of GetProcAddress to that type.

This change is evident in the screenshots above: for 7.1, the variable is named v7, its type is the generic FARPROC, and the final line shows a nasty cast on the function call. For 7.6, the variable is named IsWow64Process, its type is BOOL (__stdcall *)(HANDLE, PBOOL) (the proper type signature for the IsWow64Process API), and the final line shows no cast. Beyond the casts, we can also see that applying the type signature also changes the types of other local variables: v5 in the first has the generic type int, whereas v5 has the proper type BOOL in the second.

These screenshots clearly demonstrate that IDA is capable of resolving an API name to its proper type signature, the desirable effects of applying the proper type signature on readability, and the secondary effects of setting the types of other variables involved in calling those APIs.

Relevance to Malware Analysis

Hex-Rays' built-in functionality won't work directly when malware looks up API names by hash, or uses encrypted strings for the API names: the decompiler must see a fixed string being passed to GetProcAddress to do its magic. Although the malware analysis community seems very comfortable in dealing with imports via hash and encrypted strings, they seem less comfortable with applying proper type signatures to the resultant variables and structure members. Only one publication I'm aware of bothers to tackle this, and it relies upon manual effort to retrieve the type definitions and create typedefs for them. This is unfortunate, as applying said types dramatically cleans up the decompilation output, but this is understandable, as the manual effort involved is rather cumbersome.

As a result, most publications that encounter this problem feature screenshots like this one. Note all of the casts on the function pointer invocations, and the so-called "partial types" _QWORD, etc.:

(I chose not to link the analysis from which the above screenshot was lifted, because my goal here is positive assistance to the malware analysis community, and not to draw negative attention to anyone's work in particular. This pattern is extremely frequent throughout presentations of malware analysis; it is immaterial who authored the screenshot above, and I had other examples to choose from.)

The Solution

I did not know how to resolve an API name to its type signature, so I simply reverse engineered how Hex-Rays implements the functionality mentioned at the top of this entry. The result is a function PrintTypeSignature(apiName) you can use in your scripts and day-to-day work that does what its name implies: retrieves and prints the type signature for an API specified by name.

The script includes a demo function Demo() that resolves a number of API type signatures and prints them to the console. It begins by declaring a list of strings:

PyDemo.png

The output of the script is the type signatures, ready to be copied and pasted into the variable type window and/or a structure declaration.

GUI

I decided to add a small GUI to the functionality. After you run this plugin, you will have a new entry on your Hex-Rays right-click menu that appears when your cursor is over a pointer-sized local variable, as follows:

GUIMenu.png

Once you click on that, it will ask you to enter an API name:

GUIAsk.png

If there are no errors, the script will change the type of the variable to the function prototype for the API.

GUIResult.png

A Final Note

Architecturally, there is a discrepancy between how the Hex-Rays microcode machinery handles type information for direct calls versus indirect ones. To summarize, you may still see casts in the output after applying the proper type signature to a variable or structure member. If this happens, right-click on the indirect call and press Force call type to force the proper type information to be applied at the call site. However, only do this once you have set the proper type information for the function pointer variable or structure member.

Mostly I published this because I want to see more types applied, and fewer casts, in the malware analysis publications that I read. Please, use my script!

Addendum

After publishing this article, I was made aware of prior work that is strongly related. That work focuses on the same problem, albeit in the disassembly listing rather than in Hex-Rays (and hence does not discuss some of the considerations from my entry above). Its ultimate solution is very similar to mine; it includes two out of three API calls from the one I came up with in this entry.