What appeared on first blush to be a massive repository compromise turned out to be a little automation mixed with a little knowledge of some esoteric git commands. If certain other actions had followed, like creating backdoored pull requests, things could’ve gotten much worse.
Software engineer Stephen Lacy found 35,000 surprises a week before Patch Tuesday. The code of many thousands of repos had had code inserted, which sent environment variables to a Russian virtual private server and ran code from that server. Strangely, these identically written backdoors appeared to have been committed by many developers going back for years.
Lacy notified GitHub who quickly zapped the affected repos and reported the good news that the affected repos were all just clones of real repositories and that no real accounts were compromised.
Shortly thereafter, a new, anonymous twitter account replied to Lacy’s thread taking credit for the cloned repos, indicating that rather than trying to hack the planet, they were a security researcher in pursuit of a bug bounty, and that a report from them is forthcoming.
Patch Tuesday has come and gone and there’s been no follow up from the security researcher. That leaves us with the puzzle: how did the researcher (hereinafter: pl0x) commit backdoors that GitHub attributed to other accounts?
GitHub taking swift action on the repos is good operational security because it prevents anyone from mistakenly downloading and executing the backdoor, but it’s bad for security research because it makes it difficult to analyze the threat. Fortunately, GitHub left a backdoored clone of the open-source development platform nanobox up long enough for the wayback machine to archive, possibly because it has the word “malware” right in its title, so it’s rather unlikely that anyone will think it is a legitimate copy of nanobox.
This repo is an almost exact clone of nanobox. Except for the last commit, the dates, hashes, author names, and contents of all the commits are identical.
GitHub has a useful feature that allows anyone to produce a “fork” of anyone else’s repo, but that feature wasn’t used to produce these clones. You can tell because a forked repo will have a link to the original repo. This is a useful security feature because it lets anyone browsing repos know that they might not be looking at the genuine article.
How were these clones created if not from being forked? One way to do it is to create a second remote to a local clone. This is a useful feature for producing mirrors. Say you want to have the same code on GitHub as on bitbucket. You could do this:
But there’s no rule saying you can’t have your remotes all pointing to the same service.
Why would you do this? One reason is to prevent GitHub from letting other users know it’s a clone, but there might be another reason. Let’s look at that last commit.
The below table documents the similarities and differences between the last commit of the original Nanobox repo and the cloned repo:
GitHub links the clone commit to the same author’s page. Clearly, Tyler Flint didn’t commit to the real nanobox and at the exact same time produce a backdoored clone two years ago. Something else is going on.
First, git makes it trivially easy to make commits as someone else. Just change your username and email.
Using this one weird trick that your DevSecOps team doesn’t want you to know about will let you produce new commits as anyone, but Pl0x didn’t do that. He backdoored an existing commit. Modifying any arbitrary commit is tedious yet possible, but changing only the last commit is a one-liner:
Here’s an example:
Notice that git keeps the original commit date, even though I just made modifications to it. It also uses the original commits username and email.
Also, consider that since I’ve amended the commit on a local clone, I can add second remote and push to GitHub. GitHub’s web UI doesn’t expose the commit amend feature.
Make a local clone, add a remote that you control, amend the last commit with a backdoor, push. It might look like the following shell script:
I ran this against our open-source repo testing tool GitGoat. You can see the results on GitHub, here.
Notice that GitHub links to Nir’s page. GitHub picks the name and email of the last commit and trusts it, acting as a confused deputy.
What’s the takeaway from all this?
First and foremost, security researchers should all know by now that infecting packages with code that steals environment variables is not an innocent way to look for bug bounties. In containerized deployments, environment variables tend to contain secrets. In fact, this attack vector was used in the HauteLook attack I blogged about previously. But alternatively, anonymously stating that you were just pursuing a bug bounty sounds like plausible deniability for a hacker who got caught in the first phase of a massive reposquatting attack, depending on how much innocence you like to presume. If anyone had mistaken one of the cloned repos for a real repo, this would’ve been a nasty breach for them.
Besides reposquatting, this sort of clone could be used to create backdoored PRs for the original projects, although a fork instead of a clone would work just as well.
We at Arnica have anticipated this sort of attack and have been working on a feature in arnica.io to identify and report when the code pusher differs from the code author. If you’d like to beta this with us, please reach out. Sign-up is free.
Second, git is a more flexible VCS than a lot of folks realize. Like the Matrix, some of its rules can be bent, some can be broken.
Finally, the author of a commit might not be the person who pushes a commit. Signing and verifying your commits with GPG is a good practice. Almost no one does it and to prevent this sort of attack, consistency is crucial. We at Arnica will have an upcoming announcement related to this problem.