Thoughts on Software Translation

Over the last few months I have been working on a variety of research problems, including the automated translation of C/C++ into Rust. There are many reasons why I think that this is a desirable goal, but that is out of scope for this particular post. While I was working on this effort, DARPA announced a program called TRACTOR which aims to TRanslate All C TO Rust. I considered submitting a proposal to DARPA for this program, but ultimately decided against it for reasons that will become clear shortly.

To facilitate the comparison and evaluation of different implementations, TRACTOR operates as a batch process. Your translator program will be invoked on some C code, and it is expected to generate Rust code (and tests!) which implement "the same" thing as the C program. There is a series of increasingly difficult and complex C programs — some announced and some secret — that will be used to assess the capability of the translator.

However, the core problem lies in what we mean by "the same". It is not a bug-for-bug, or instruction-for-instruction compatible transformation, but rather a loosely defined sense in which the Rust program does "the same" thing as the C program. Indeed, the entire purpose of TRACTOR is to make C programs more secure by replacing them with memory-safe Rust programs, so the output needs to avoid the use of unsafe as much as possible. Due to various differences between C and Rust (including pointer aliasing), the meaning of the program needs to be modified in ways which are not valid for all programs. Reasoning about which programs it is valid for is not decidable due to Rice's Theorem, so we are forced to use approximate methods.

I believe that a tool which can do something like this is possible with clever integration of new technology like LLMs and more traditional static and dynamic analysis approaches. Indeed, this is where the majority of my work has been over the last few months. However, it is also my belief that a translator which operates as a batch program like this is doomed to failure for programs of any significant size. To understand why I believe this, consider the following simple C program:

int add(int a, int b) {
  return a + b;
}

This is probably as trivial as you can imagine a program being in C, but already we run into a problem. The int type from C doesn't have a great analog in Rust. Sure, we could use std::ffi::c_int, but unless this is a foreign function interface that is really just avoiding the problem. Ultimately, for idiomatic Rust we need to choose i8, i16, i32, or i64. Generally it's probably safe to just guess i32, but that won't be true everywhere.

A batch-oriented translator will be forced to make a choice here and stick with it. That choice might be important to the meaning of the original program (perhaps integer overflow is expected) or it might be unimportant (perhaps the values are always small). Without sufficient context it is not possible to know. This context might be something that can be inferred from how the function is used elsewhere in the program, or it may be something that the programmer assumed to be the case (e.g. "I'll only ever compile this on x86_64, so it's 32 bits"). It's tempting to say that it's the original programmer's fault for writing the code ambiguously, but they will not see it that way when your translator gives them "bad" output. They'll just not use the translator at all and declare it unworkable.

There are many similar problems throughout the entire translation process, and many decisions need to be made by a translator in order to correctly perform a translation. As the program size increases, the probability of a "mistake" -- reasonable or otherwise -- increases dramatically. Thus, expecting a translator to make all the right choices for every program is unrealistic. If you only get one shot in a batch process, errors are almost inevitable, except in the simplest cases. This is where the idea of an interactive translation approach comes into play.

Thus, a translator that will actually gain adoption and widespread use needs to make a good guess and then provide a trivially simple method for adjusting the output according to inputs from the operator. One might even imagine using an LLM to allow the operator to explain the issue they see with the program in plain English.

I understand why TRACTOR went the route they did: the DoD has millions of lines of C code which would ideally be translated entirely automatically vs requiring a human operator in the loop. Additionally, it's a lot harder to judge the performance of a tool which requires an operator to refine the output or request edits. However, because of this issue, I doubt the tools developed directly by TRACTOR will gain widespread adoption. Still, I hope they will be effective enough to inspire interactive versions reduce the effort needed to translate C to Rust in the future.

If you’re working on software translation or a similar project, I’d love to hear about your work.