In a previous blog post I discussed the difference between application security and software supply chain security in the context of an abstract DevOps process. Recently, a work project provided me with the opportunity to more fully illustrate this difference in concrete terms.
I needed a library to parse OpenSSH keys to support our free secret scanning service and decided to approach the experience from an application security rather than a software supply chain security perspective.
Here at Arnica we help identify and validate secrets in our customer’s repositories. We typically can’t do this on our own. There are plenty of open-source packages that make it possible.
Looking for new tools, I stumbled across Scott Wang’s OpenSSH Key parser library, https://github.com/scottcwang/openssh_key_parser, which interested me because it’s written in the One True Language, and gives the caller access to the kind of important cryptographic metadata that lets us at Arnica contextualize findings for our users. For example, using this library I could tell you that a key I found in your repo is not passphrase protected.
Two things concerned me about openssh_key_parser, though. First, despite having decent download stats, it’s only on version 0.0.5, so a bit immature. Second, its author has helpfully provided this disclaimer:
Which is disheartening. Nobody wants to trust their crypto keys to software with security bugs, but security reviews don’t just fall out of the sky.
Two years ago Dan Geer asked us who pays the piper for maintaining the open source software that we all use and take for granted.
We’ve started getting better as an industry about paying due care. Folks at the OpenSSF’s Alpha-Omega Project work to bring security resources to critical open-source projects, and they correctly prioritize widely deployed software, and every prioritization scheme involves trade-offs.
It’s less expensive to fix security defects earlier in the software process. This is the impetus for the “shift-left” movement. Similarly, it’s less expensive to fix security defects when software has a smaller deploy base, and for the same reason: more deployments make for a variety of deployments, which make for uncertainty in how any changes—bug fixes included—will affect those deployments. This is especially true of software that supports other software such as operating systems, APIs, and libraries.
Popular projects with wide deploy bases benefit from the help of many security researchers. But an emerging project without such resources will accumulate defects. By the time it is too big to fail, it may be too buggy to succeed.
So, in short, I decided to spend some time doing an in-depth security review of openssh-key-parser.
This exercise also provides a meditation on the difference between software supply chain security and application security.
Application security is largely about taking responsibility for the security of the software you build, from design to deployment. Software supply chain security encompasses the different security aspects of your dependencies, including operational concerns like packaging, delivering, and patching. Here I’m going to ignore all that stuff and just treat my upstream package as software to be secured.
I like to begin this kind of security review by considering what types of defects are relevant and likely to the project under review. This helps determine what methods I use, because different defect discovery methods have different strengths and weaknesses.
There are three different places where security defects come from:
Because openssh-key-parser is a single software component, there isn’t a whole lot of design to pick apart. Given a key, it parses it. It’s a small piece in someone else’s design. If there are any design problems, they likely arise in failure to consider the variety of use cases that this could be used in.
It is written in Python3 which is loosely typed, memory managed, and supporting arbitrarily large integers as its default numeric type. These features tend to clear a lot of bug categories out. Python’s error handling system make it ripe for unhandled exceptions. The python interpreter and some Python libraries are written in C, which make them susceptible to buffer overflows and the like.
The library parses OpenSSH keys, which are basically base-64 encoded C-style structs with Pascal-like strings: an unsigned integer representing the number of bytes which comprise the string, followed by those bytes. So, the string “bcrypt” would be encoded as \0 \0 \0 \6 b c r y p t. Any system that parses these data structures has to be resilient to a mismatch between the size declared and the size of the file, as well as mismatches when two different sizes are used to represent the same data. Such a mismatch was the cause of the Heartbleed buffer overread bug that plagued OpenSSH. Also, because the library handles cryptographic keys, a variety of crypto-relevant defects are possible: mishandling of keys and passphrases, side channel disclosure of keys. But others—insecure random number generation, cryptographic protocol selection—are not relevant to a key parser.
Considering these in advance makes it easy to decide how to go looking for bugs. Different defect detection methodologies are better or worse at finding different kinds of bugs. For example, static analysis tools have a hard time finding errors of omission, like authentication bypasses. Penetration testing is bad at finding weak use of cryptographic primitives unless some other bug reveals their use to the tester. For example, a penetration tester might be unaware that the system they are testing hashes passwords with MD5, unless they first find a SQL injection that exposes the back-end data.
For the set of relevant bugs, I decided to use a combination of static analysis and fuzzing to guide an in-depth manual code review.
Depending on the complexity of methods used, static analysis can find low-hanging fruit or deeply nested bugs.
I used pysa, Google’s Python static analysis tool, which uses taint propagation, a relatively complex method that involves tracing data paths through code. The data paths involved in openssh_key_parser are pretty short and simple: it parses keys. There are no web components, SQL databases, and the like.
Running pysa is a breeze. Getting it set up is a bit of a hassle, but that’s okay. The thing about static analysis is that it’s free bugs except for the effort you put in to getting the tool working with your code.
In this particular case, pysa didn’t find anything.
Fuzzing is good at finding unexpected conditions and weaknesses in input validation. It does this by repeatedly sending random input to software.
Being productive with a fuzzer is a bit more involved that with a static analysis tool. They are of limited use if they just squirt random nonsense at your software. Any sane input validation can handle clearly garbage input.
For example, OpenSSH keys, at a high level, consist of three parts; a begin tag:
-----BEGIN OPENSSH PRIVATE KEY-----
A Base-64 encoded stanza with key material and metadata, and an end tag:
-----END OPENSSH PRIVATE KEY-----.
Without those beginning and ending tags, openssh-key-parser barely even looks at its input. And the chance of a random character generator randomly producing those tags is infinitesimal so the fuzzer needs guidance on how to produce useful garbage: garbage that passes the smell test but fails the taste test.
I used Google’s Atheris which is an adaptation of libFuzzer suitable for fuzzing Python libraries. LibFuzzer is the tool used by OpenSSH itself to fuzz its own file formats and protocols.
Many fuzzers require that their users write grammars for the input streams that they are to produce. Atheris doesn’t make you do that. Instead, it is coverage guided: you instrument your library source code to let Atheris peek at it during runtime and Atheris discovers what kind of inputs flex as much of the codebase as possible.
In order to get Atheris working meaningfully with openssh_key_parser I had to write some custom code.
I think of fuzzing in terms of manipulating layers of abstraction. First, I wanted to just send the library random stuff to see what happens. The results were uninteresting: the library raised ValueError exceptions for any random gobbledygook I sent it. This is documented behavior.
Next I wrapped my gobbledygook with valid header and footer lines, the ones that -----LOOK LIKE THIS-----. This also produced ValueErrors.
Finally, I Base-64 encoded my gobbledegook and sandwiched it between valid header and footer lines. That’s when I started receiving EOFErrors, an undocumented condition.
Unhandled exceptions in security code can result in serious security problems.
Why didn’t static analysis find this? Because my static analysis tool isn’t programmed to read the library documentation and ascertain what exceptions are valid. Different defect discovery methodologies are better or worse at finding different findings.
I could’ve gone deeper with Atheris, producing random inputs that play loosey-goosey with those Pascal-like strings, for instance, but writing the custom mutators to do that would be time-consuming than just writing my own key parser. Also, OpenSSH doesn’t even get to this level of depth with their own fuzzers.
As I mentioned before, openssh_key_parser is small: only about 8,000 lines of code. I wanted to validate how it went about validating those Pascal string lengths. Instead of writing custom fuzz code for a week I thought it best to follow the guidance of one of my mentors: “Use the source, Luke!”
Openssh_key_parser is very richly commented and while it uses delegation and indirection much more than my rough-and-tumble, git-r-done scripting style, I found it quite easy to bounce through the control flow with a decent IDE.
Here’s how it is implemented:
The PascalStyleByteStream class has a read_fixed_byte_stream method which reads an arbitrary number of bytes from its underlying keystream. If the number of bytes you read don’t match the intended number of bytes—for instance if the intended bytes got manipulated by an attacker to be larger than the whole file—then that whole file would be returned by self.read, but len(read_bytes) would be smaller.
How would an attacker manipulate the num_bytes read from the key? Is it even possible? I didn’t find an attack surface in this library that could be manipulated to induce such an error, but this is meant to be a small bit of code in a larger project. Perhaps the key is transmitted using an unauthenticated stream cipher susceptible to a bit-flip attack. It simply must be possible because if num_bytes always agreed with the number of read bytes, this error condition would never need to be checked for.
Okay, so it’s possible. What’s the worst that could happen?
When these to values mismatch, the code raises an EOFError; behavior which is documented for method but undocumented upstream from it. The message this method sets for that EOFError is the entirety of the bytes read.
This is a typical trick that programmers use to debug their code. It’s basically the software telling the programmer “I have no idea what this input is, what can you make out of it?”
But there’s a problem with using this technique in this code which handles cryptography keys: the rest of the key file including all the bytes that make up the actual encryption key are raised up the call stack. Depending on how the library is called and how its exceptions are propagated, those keys can end up in places where they are not supposed to, like log files.
An attacker that can stomp on a key they can’t read, but that can read exception logs can recover the keying material.
At least, that’s what the code looks like it’s doing. To prove this out, I first generate an openssh key from the command line:
Next, I used a text editor to strip out the header and footer, base64 decode the rest, and edit it with a hex editor.
The sixteenth byte is the first byte of a Pascal-style string. I flip the first nibble to all ones by changing it from a zero to an upper-case F. This will produce noticeable differences in the Base-64 encoded version of this key. In a real-life attack, the attacker would probably have to manipulate the Base-64, which might involve some tedious math. I can prove the concept much cleaner and faster with a hex editor.
From there, pack the binary back up and put the header and footer back on. Then load up the key in python.
Parsing this key produces the following EOFError:
Which clearly contains all they binary from the header to the “Test exception log key” comment I used when creating the key, to seven bytes of padding. It would be trivial to reconstruct the full key from here.
What’s the risk of all this?
Hackers exploit vulnerabilities to gain power and information that they then use to get more power and information. As mentioned above, the only attacker that could benefit from this exploit is one who can read exception logs and could change part of a key that they can’t already read. The openssh-key-parser library doesn’t have functionality that can be abused to induce such an error itself, but this is meant to be a small bit of code in a larger project.
It all depends on the rest of the system. One way it could happen is if the key is transmitted over an unauthenticated stream cipher susceptible to the kind of bit-flipping attacks that broke WEP WiFi protection. Another way it could happen is if the container next door mounts a malicious ext4 filesystem that lets it write arbitrary bytes on your volume. I know I’ve used blind command injection attacks against real world systems that would let me pull this off.
To have a better understanding of the likelihood of exploiting this vulnerability, we’d have to see many diverse deployments of the parser. And then once it’s everywhere, the attack gets to be a calamity, like the attack against widely-deployed WEP or ext4. If we want to assess risk and secure our systems before they are calamities, we must learn to work with hypotheticals.
Which brings us to the impact.
An attacker that can stomp on a key they can’t read, but that can read exception logs can recover the keying material used to construct an openssh private key, granting them access to all the systems that trust that key. This is very similar to the Heartbleed buffer overread bug in OpenSSL, which is worth note that the team at Codenomicon found Heartbleed by fuzzing. Note that a buffer overread is different from the sorts of buffer overflows that Python typically does a good job to protect against.
The easiest way to pull off the attack is to modify the first field length in the key, which comes right after a “magic number” which is the same for every key file, and which always Base-64s the same: “b3BlbnNzaC1rZXktdjEA.” An attacker can exploit this vulnerability if they can execute this regex substitution against the key file or all the files in a file system:
It's one of those things where if it isn't impossible, it's trivial. Similarly, reading the keying material out of the exception log is either impossible or trivial. Because the attack complexity is low once you know the vulnerability is there and that you can stomp the key, and because the target asset is an OpenSSH key which lets the attacker change scope of their attack, CVSS would rate this as a high-severity finding, which I feel is a bit unfair. If you can’t stomp the key or can’t read the exception, the risk is zero, after all.
Before writing a pull request, I like to reread Simon Tatham’s excellent guidance on how to file a bug report.
I organized my pull request into three commits. The first fixed the potential key leak. This vulnerability has the highest impact. The second fixed the unhandled exceptions: more likely but less serious. The third contributed my fuzzing scaffolding to the project.
One of the nice things about open source is you don’t have to wait for the patch if you know how to fix the problem. You also have the opportunity to contribute fixes to the community. Witness the power of open source in action (the pull request activity on this topic).
This XKCD comic has been getting a lot of mileage over the past year or so. Searches for its title spiked after the July, 2021 npm attack, and again after the September, 2021 TravisCI attack, and again after the faker.js attack, and on and on.
To mix aphorisms, it’s funny because it’s true, but the things that make us laugh also make us cry. We’re facing a crisis. For most of us, building software is too expensive not to depend on other people’s code.
I spent a week treating upstream code with the same care I would treat my own. I found some decent bugs, made a few kLoC a little safer. It is a very expensive, time-consuming process to give every dependency the application security treatment. Here at Arnica, we’re working on better ways to secure the software supply chain.