vrankenfuzz

2020-02-27 214浏览

  • 1.VrankenFuzz a multi-sensor, multi-generator mutational fuzz testing engine By Guido Vranken, May and July 2018Website:https://guidovranken.com/Introduction Intended audience This document is directed at computer security specialists, specifically those with an interest in leveraging fuzz testing to find bugs in software applications. This research and its implementation is especially suited for people who perform hands-on (active, not passive) vulnerability research. Most of the examples set out from a situation where the analyst has access to the source code. Core concepts Years of hands-on experience with fuzzers have lead me to identify what in my estimation, and in relation to my needs, are several shortcomings in the existing fuzzing engines. To overcome these shortcomings, I have reasoned about the essence of fuzz testing and have broken it down into several general concepts. Each of these concepts have been implemented separately in VrankenFuzz. In addition to this, whereas a traditional fuzzer will only manipulate the target, VrankenFuzz allows the target to dynamically manipulate the fuzzer. The atomization of core fuzz testing primitives and bidirectional influencing is what summarizes VrankenFuzz and what makes it unique (to the best of my knowledge). One of these primitives is sensoring. Earlier, I released libfuzzer-gv1, which implemented stack depth guided fuzzing, code intensity guided fuzzing, allocation size guided fuzzing, and custom guided fuzzing. VrankenFuzz proceeds to build on the realization that not just code coverage, but any quantifier can be used to guide fuzzing. Another primitive is the Generator. Generators allow a target application to have not just one, but multiple inputs. An obvious use case is targets that simply require multiple input variables. But there are also less obvious reasons that merit the use of multiple Generators, even if a target (normally) takes just one input. Why and how I will explain below. The third primitive is the Processor. The Processor defines how to interpret Sensor data. Different interpretations are useful in different circumstances. 1https://github.com/guidovranken/libfuzzer-gv
  • 2.Use cases Fuzzing anything. Fuzzing programs written in other languages that do not support traditional instrumentation. If you have some way of knowing which code paths are taken, you can probably fuzz it with VrankenFuzz. Differential fuzzing. Each of multiple implementations of the same specification can have its own code coverage tracked separately by leveraging VrankenFuzz’ multi-sensor facilities. Black-box fuzzing, even remotely. A logical categorization of a black-box application that is in line with its internal state allows it to be fuzzed. Create a sensor for a remote web application’s execution time (eg. round-trip time minus average round-trip time) and the fuzzer will tend to generate inputs that lead to an ever-greater execution time2 (= denial-of-service). Create a sensor for the total number of unique error codes that a HTTP server has returned for a body of inputs, and you will end up with a corpus that likely constitutes a set of diverse internal states within the remote server. Fuzzing “unfuzzable” code. Reaching code that is, by definition, not reachable with a traditional fuzzer, for example because it essentially requires cryptography to be defeated. An example can be found later in this document. Fuzzing binaries. In conjunction with a processor emulator interface framework like Unicorn3, it is possible to provide precise code coverage guided fuzzing of binaries (by representing the processor’s program counter with a Sensor). A supplement to manual audits. VrankenFuzz is very versatile and once you internalize its logic, several approaches can be used to exert fine-grained control. For example, you can force it to focus on a certain function within the target, instead of hoping the fuzzer will hit that function repeatedly by happenstance. You can force certain inputs, for example inputs that take a long time to execute, or inputs that consume a lot of memory, to be discarded (not added to the corpus), thereby creating a corpus of fast-executing or low-memory inputs. Etcetera. Interactive fuzzing What and why If your objective is to find bugs in an application, there are several tactics at your disposal, one of which is fuzz testing. But writing a fuzzer for a target and letting it run in the background may 2 This assumes a certain inherent relationship between input data and execution time that can be explored gradually. 3https://www.unicorn-engine.org/
  • 3.not always be as efficient as you’d expect. Fuzzers are an excellent tool for very quickly covering code of a certain type of complexity. A fuzzer’s reach can rapidly extend across a maze of functions and branches that appear byzantine to the human mind, for whom the overarching cohesion is lost in syntactic or semantic complexity. The human mind is better at reasoning about other types of code, that conversely are “blockers” for the fuzzer. The mind can recognize certain constructs in the target code as being unresolvable through sheer trial-and-error testing (as the fuzzer essentially does), such as verifying a MAC that was computed over the (dynamic) input data. In the quest for program bugs, the fuzzer and the human mind can complement one another in an endeavor I call interactive fuzzing. When performing interactive fuzzing, the task of the analyst istwofold:- See to the efficacy of thefuzzer:is it still reaching new code? Why is it not reaching new code? Is code coverage of the highest importance in this target, or can other variables play a meaningful role as well? - Push the fuzzer in the rightdirection:comment out code (if available) in the target where it makes sense. Construct an input manually in a hex editor. Drive it towards other types bugs than memory violations, like excessive memory consumption, or deep recursions, or very slow executions (denial-of-service). Add assertions (eg. if ( ... ) abort()) where that makes sense. When usingVrankenFuzz:can I implement Sensors and Generators at strategic places? Circumventing blockers Cryptographic blockers A noteworthy example of blockers can be found in applications that employ cryptography. If we imagine a network protocol that authenticates incoming data by computing a Message Authentication Code (MAC)4, execution will never move past the data authentication phase, because a fuzzer simply does not possess the ability to crack the MAC, and the crucial comparison between expected and computed value never resolves to true. In the case of MAC or hash verification, dictionaries are certainly not a viablestrategy:no manageable repository of static dictionary values can ever match the MAC or hash that the application computes over the dynamically generated input messages. So what I would do in this case is comment out the original comparison between computed 4https://en.wikipedia.org/wiki/Message_authentication_code
  • 4.MAC and expected MAC, and replace it with a choice based on a pseudo-random boolean that decides whether to proceed or exist. Essentially thefollowing:/* Original */ bool process_message(const uint8_t* msg_data, size_t msg_size) { const auto expected = expectedMAC(); const auto computed = computeMAC(msg_data, msg_size); if ( expected != computed ) { return false; } else { return true; } } /* Version used in audit */ bool process_message(const uint8_t* msg_data, size_t msg_size) { const auto expected = expectedMAC(); const auto computed = computeMAC(msg_data, msg_size); if ( (rand() % 2) == 0 ) { return false; } else { return true; } } But editing the target source code to suit your needs, isn’t that defying the purpose of the audit, because your objective is to find inputs that crash the application in its original form? No, in this case it is not, because with this modification we mimic what could also happen in an unaltered productionenvironment:each input message can either contain a valid MAC or an invalid MAC. So by altering the source code, we have not deviated from the similarity to a production environment, but we have made the fuzzing environment more like it. But there is one problem with the modified process_message(). It succeeds or fails based on rand(). But rand() is fickle – unless repeatedly seeded with the same value, its PRNG state is retained across iterations without regard for the input at hand, and this breaks the determinism of the fuzzer. Concretely, if we have input I, rand() % 2 may produce 0 if we run I one time, and produce 1 if we run I again. This leaves us in a situation where it will be more difficult to reproduce crashes, and a sub-optimal functioning of the fuzzing engine’s algorithms (because it cannot determine whether it was a mutation that lead to new code coverage, or if it was the output of rand(), and it will decide that it was the former, leading to unwarranted
  • 5.addition of inputs to the corpus). It would be helpful if we could just pull some additional data from the fuzzer engine in the middle of the target. This is what Generators are for in VrankenFuzz. On trickle-down logic Consider an application that implements a network protocol. It expects a well-formed message to be encapsulated within several layers of coding schemes. One layer re-assembles several separate network packets, into a single, meaningful message. The next layer takes this message, which it expects to be base64-encoded, and decodes it from base64 into binary. The third layer uses an ASN1 decoder to decode the binary message into several distinct data types. The fourth layer validates these variables with some custom logic. enum { PROTOCOL_HANDSHAKE = 0x9999; };std::vector'>std::vector