Originally published on December 25th, 2009 on OpenRCE
Here is the source code for a compiler that I wrote in Objective Caml this semester, for a subset of the C language. It requires a standalone C->IR translator which is not included in this release, as the school owns the copyright on that particular piece of code. Hence one cannot immediately use this compiler to compile C programs without writing a C front end; anyway, an existing compiler such as MSVC or GCC would be a better choice.
The portion of the code that I wrote (everything except bitv.ml and .mli) totals roughly 3200 lines of code. It includes two optimizations based on classical data-flow analysis, constant propagation and dead-statement elimination. It also supports translation into and out of static single assignment form, as well as two optimizations based on SSA: constant propagation and loop-invariant code motion. The code for the graph-coloring register allocator is not included. As a back-end, the compiler produces C code that can be compiled by any other C compiler.
The code is a pretty good example of how to structure medium-sized programs in functional programming languages. I tried to adopt a pure functional style throughout most of the codebase. Sometimes this failed: cfg_ir.ml is unnecessarily ugly, and should have been re-written in an imperative style with mutability; also, I made the mistake of using a mutable array to describe Phi values in static single assignment whereas pure functional style would have dictated a list. But those are my only complaints with the code; overall, I'm pretty pleased with how it all turned out.
This code is substantially more sophisticated than the compiler that I wrote to break VMProtect, so if you can read and understand this release, you should be in good shape for breaking virtualization obfuscators.