Synesthesia is an idea that I published at EkoParty last year (slides and video are available) regarding automated shellcode generation under encoding restrictions. The presentation walked through an extended tutorial on program synthesis, and showed how the idea would be implemented using those techniques. I promised to release code; this blog entry is the notification of such release, and some explanation of what the code is, what it is not, and what I hope it shall be in future releases. Here is the GitHub repository.
Limitations of current release
As envisioned in that presentation, the ideal implementation of Synesthesia is a stand-alone compiler with three modes: 1) generate shellcode under input restrictions given a specification for its behavior; 2) re-compile existing shellcode under input restrictions; and 3) encode and generate decoder loops for existing, non-encoded shellcode binary blobs. No matter which mode, in the ideal fully-automated implementation, the user should be able to write programs that dictate their requirements on the shellcode, invoke the Synesthesia compiler, and recieve machine code as output. As with any compiler, Synthesthsia should be a black box: to use it, the user should not have to be an expert user of SMT solvers or possess advanced education in theoretical computer science and mathematics. At present, the current implementation falls short of the goal of being an actual compiler: the process is not fully (or even largely) automated.
The current implementation of Synesthesia is a set of .ys files, scripts written in the YICES 2.x SMT solver presentation language. Each one defines an SMT query corresponding to an example given in the presentation. To obtain a result, the user must pass the .ys files into yices 2.x with the --mode=ef command-line option, and then manually interpret the results. To solve a different problem from the ones given in the presentation, the user must manually create a .ys file (perhaps using the existing ones as a template) and pass it to Yices for solving.
Lately, I have been working on (and enjoying) bringing Synesthesia closer to the ideal vision. The in-progress version is a legitimate, stand-alone compiler with its own programming language, where the machine language decoding routines are written in that language. However, it is in early development at the moment and is not yet suitable for release.
Example code
One of the most interesting files in the repository is the x86 version. It implements several of the examples from the presentation. However, understanding it may be difficult before reading the implementations for the earlier tutorial examples given in the presentation, and those examples also contain sophisticated techniques regarding loops. They are described subsequently.
The first part of the presentation walks through a simple example of synthesizing C programs. The .ys implementation can be found here. Hopefully, with the comments in the code and its short length, this should be easy to understand if you read the presentation.
Next, the presentation extends the ideas first to synthesizing assembly language programs, and then to synthesizing machine language programs. For demonstration, it uses two imaginary languages, the "Simple" assembly language and its "SimpleMC" machine code. The first example involves synthesizing the "increment" operator in Simple assembly language. That example is found here.
Shortly after in the presentation, I demonstrate how to obtain the longest or shortest program satisfying the constraints.
Much of the remaining material is dedicated to synthesizing decoder loops. The first examples involve simple loops. The next two examples synthesize complex decoders that take two bytes of input to produce one byte of output. The first example restricts the input to printable bytes; the second restricts to alphanumeric bytes.
That's all for now; I hope somebody finds it useful.