Performant Binary Fuzzing without Source Code using Static Instrumentation
Advancements in fuzz testing have achieved the ability to quickly and comprehensively find security-critical faults in software systems. Yet, some of these techniques rely on access to source code, which is often unavailable in practice. In this paper, we explore techniques to replicate the depth and efficiency of source-code available fuzzers via static binary instrumentation. Developing such instrumentation is difficult because compilation is a lossy process, and much of the source-level semantics leveraged by these techniques are not available in binaries. We recover much of this information via heuristic control flow reconstruction, a shadow stack for function identification, and a novel technique for instrumenting comparison instructions. We evaluate RWFUZZ on the LAVA-M dataset, achieving the same effectiveness as a best-in-class source-available fuzzer with a 3.4 × execution time overhead (lower than existing dynamic fuzzing approaches). In this way, we show that techniques for binary fuzzing may approach the functional ability of source-available fuzzing.