When I was a child, I liked to break things to see how they were built. When I was older, I didn’t grow out of this habit. In fact, I joined a company with like-minded individuals. Now we don’t break things just for the sake of breaking them; we break them to uncover software security weaknesses and fix them.
Deconstructing things allows you to understand not only how they are built but also how they can be built better. By breaking software using solutions we’ve developed, we’re able to understand how it’s built, uncover vulnerabilities that can be abused, and fix those vulnerabilities to build more robust, secure software.
There are several methods for finding vulnerabilities and issues in Bluetooth-enabled devices, fuzz testing being one of them.
Fuzz testing is a method of feeding applications automatically generated, unexpected inputs. Fuzz testing addresses the question “What happens if I purposely input invalid values into an application?” with efficiency and, based on the sophistication of your fuzzer, effectiveness.
Here’s an example. Let’s say an application has an input field that expects a first name. Fuzz testing enables testers to execute on the question, “What happens if I feed multiple variations of over 10,000 ‘a’ characters into it?” Most security researchers, and perhaps hackers, will likely aim to overflow an unbound memory or trigger other anomalous reactions. Let’s add another level of complexity to this example: What if we were to modify the invalid input by adding a C++ format specifier—for example, “%s”—to trick the application into granting us improper memory access? Can you think of some other unexpected input?
Some applications can handle these types of unexpected inputs just fine. However, others may crash and even allow attackers to execute commands for fun—or for profit at the expense of end users. Unauthorized access and absent fail-safes are particularly concerning in systems and software related to safety-critical fields, such as automotive.
Because practically all modern car kits are Bluetooth-enabled, fuzz testing is highly relevant in the automotive industry. In this post, I’ll explain how we built a solution to fuzz Bluetooth-enabled devices, the challenges we ran into, and how we solved them.
Hackers enjoy solving technical challenges. The joy of finding an elegant solution to a technical problem is very similar to the joy of solving a difficult puzzle. However, some people like to break things to be mischievous, and some even make a living finding and exploiting vulnerabilities for criminal purposes.
Because cars are ubiquitous and expensive, they attract criminals of all types, even from a cyber security perspective. Add the fact that modern cars incorporate more software with each new model, which increases the attack surface, and cars begin to look like ubiquitous, expensive, and easily hackable targets.
It’s critical to ensure that remotely exploitable vulnerabilities are not reasons for pricey, brand-damaging mass recalls. In the automotive industry, proactive security engineering yields a high return on investment because there are only a few technology providers for car kits. Thus, any issue found early saves providers from dealing with hundreds of thousands of vulnerability-infected cars.
To fuzz test Bluetooth, we needed to answer two questions:
Defensics provided a solution to the first challenge.
Defensics can fuzz Bluetooth protocols by modeling them—meaning Defensics learns the “rules” that govern valid communication via the Bluetooth protocols. Effective fuzzers have a deep understanding of the protocol specifications; they understand how to tweak inputs just enough for the system to accept them without blocking them. As the old saying by Pablo Picasso goes, “Learn the rules like a professional so you can break them like an artist.” While there are other ways to do this, we’ll focus here on model-based fuzz testing.
To address the second question, we had to analyze the system under test (SUT). A viable target device must meet software and hardware requirements to receive fuzzed inputs from fuzzers such as Defensics. Specifically, we needed a network stack, also known as a protocol stack, which is an implementation of protocols that enables devices to communicate with one another. Stacks are usually separated into layers, with each layer having its own responsibility. Stacks ensure the interoperability between different protocols.
So how did we get a Bluetooth stack suitable for fuzz testing? We considered two options:
The biggest benefit of making our own stack was that we could build it with fuzz testing in mind. On the other hand, buying an existing stack would save us development time, giving us a head start against unethical hackers, but we would need to modify the stack to suit our fuzz testing needs. This process would include ensuring that the stack did not crash when we sent unexpected data through it.
After careful consideration, we chose to purchase an existing stack, allowing us to concentrate our development efforts on security, especially fuzz testing.
One goal of protocol stacks is to make application logic development easier. A framework that facilitates communication between protocols allows developers to focus on application logic without getting bogged down by the underlying technical foundation. A stack not only provides developers with access to each protocol layer but also enables them to implement valid checks (e.g., size and format) that ensure messages are properly formatted for software processing. These checks and services are excellent for most use cases, but how about for people like us who want to send broken messages to target devices deliberately?
Figure 1: A diagram of a simplified Bluetooth stack with RFCOMM replaced with our own implementation
Let’s study the simplified Bluetooth stack in Figure 1 and use RFCOMM as an example. RFCOMM provides emulated RS-232 ports and works as a transport layer to several higher-layer protocols, such as HFP, SPP, and OBEX. The stack we purchased had RFCOMM implemented. So what modifications are required to fuzz the RFCOMM implementation of the target device? Surprisingly, removing the validity checks in the stack was very easy. It required just a few modifications.
Modifying an entire layer, however, is a different story. Creating an RFCOMM connection requires a negotiation sequence to be sent between two devices. What if we want to fuzz the negotiation messages by feeding invalid parameters to the test target? The difficulty here is that we cannot use the RFCOMM implementation within the stack, because it speaks valid RFCOMM. The solution we came to was to skip the RFCOMM implementation completely.
How did we do this? As we can see in Figure 1, RFCOMM messages are sent over L2CAP, meaning they’re sent as a payload of L2CAP messages. We wrote our own implementation of RFCOMM, which communicates directly with the stack’s L2CAP implementation, allowing us to bypass the stack’s RFCOMM implementation completely. Now we have a solution to feed whatever we want against the RFCOMM implementation of the target through the stack’s L2CAP implementation. The end result of this practice was Defensics RFCOMM test suite.
We took a similar approach for several other Bluetooth layers and services.
Now we had a solution that allowed us to fuzz different Bluetooth devices. The Defensics team and our customers found and fixed important issues such as CVE-2017-2420 using the solution. We also implemented fuzzers for Bluetooth Low Energy, but that’s a story for another time.
As we continued to work with the Defensics Bluetooth test suite, we noticed we had interoperability problems with some car kits. The interoperability resulted from two main issues:
Why do car kits work this way? It’s because they’re expecting to connect to smartphones.
To resolve our interoperability problems with car kits, we needed to act like a smartphone and simulate the services it provides to car kits. With the help of several excellent Bluetooth sniffers, we researched how popular smartphones interact with car kits. We realized that we needed to be able to do the following:
To simulate these services, we captured Bluetooth traffic between phones and car kits. Because our purpose was to find vulnerabilities and verify the robustness of devices rather than test their conformance, we implemented only the services that allowed us to establish a connection and execute our fuzz tests.
We also moved a significant amount of our application logic from C-native code to Java side. We love C but found that the switch made debugging much easier and enabled the flexibility and modularity we needed. Additionally, our platform now runs on top of Linux. Linux has superb development tools for Bluetooth, such as Wireshark, straight out of the box and offers freedom from Windows drivers.
What started as a project to build a fuzzer for Bluetooth-enabled devices resulted in an end product that does just that and more! Our Bluetooth fuzzer is not only unique but is also something we’re proud of. It’s a fuzzer that uses Bluetooth as a transport to test car kits with few to no changes to the device under test.
It’s hard to say which part of this project has been more enjoyable: having the end product or taking the journey to get there. We had a tremendous amount of fun learning new things and finding elegant solutions to difficult, technical problems. I can say that this has been one of the most satisfying projects I’ve worked on. It’s also been a reminder of why I’ll continue to break and deconstruct things—to show people not only how they can secure their products but also how they can improve the way their products are built.
The end result? We found something unique that can be used to test car kits without changes (or with only small changes) using Bluetooth as a transport.