I wanted to learn more about secret detection. Maybe my notes will help somebody else. In no way, shape, or form I claim to come up with any great idea in this article by myself, if I forgot some citation, please let me know.
First, let's go over the basics, then we can try them out in the WrongSecrets OWASP project.
Secret Detection 101
Putting my mathematician hat away for a moment, I can now use circular definitions.
A Secret Is Information That ... Should Stay Secret
You also shouldn't make your secrets public. Obviously, if the attacker has your secrets then he can access your system. That's bad, that's what you want to avoid.
Examples of secrets usually are: passwords, private keys, tokens, DB connection strings, webhooks and many more.
Secrets Are In The Data
We have data and we want to extract secrets from them.
Either we are on the offensive - then we want to separate the passwords and keys from the landfill. Or we are on the defensive - finding out if we are not leaking some secrets so we can fix the leak. Either way, the main goal of secret detection is to... detect secrets.
To feed the SEO - you can try to point AI to your data in hope of identifying secrets. It might even work. Consider that you are sharing your secrets with a third party that was able to make sense of the internet. I wouldn't recommend this approach. Here's what to do instead:
Secrets Are Contextual
The sad part is, that there's no regex or other hard rule that would identify all of the secrets. What's a secret and what are data depends on the context of the data usage.
Examples:
A password can resemble a SHA256 hash, but sometimes it's just a hash noting a version of currently used library in some
go.sum
file.An innocent line such as
pw = "TODO: Integrate with Vault tomorrow"
is actually a secret, because the developer went on a sick leave and now it's the password to the production DB.A private key is not a secret when you are writing a parser for private keys and you have test cases with... private keys that point to no system or account.
A hash of a secret is still a secret - you can try cracking it via some online service (e.g. crackstation) or on your GPU (see my attempt at Password Policy Attacking Costs).
You will face false-positives. Please keep that in mind and don't trust the tools blindly. Case in point, I created a “plugin” for a tool to mark Okta tokens and I’ve received this fancy email (opinion at1):
What the tools can do for you to identify simple context:
Look around suspicious strings such as
password
,pw
,pass
,private_key
.Check the formats of known tokens via regex (e.g.
-----BEGIN PRIVATE KEY-----
).
Secrets Should Be Random
Another approach that is used has to do with randomness - entropy. High entropy strings in text are most likely secrets. The formula can be found as theorem 2 in the seminal paper A Mathematical Theory of Communication (C. E. Shannon, 1948).
I’ve got plenty of notes on this topic as well, but that won’t fit in this overview, so I’ve written it here in a separate post.
Definitely go read the paper as well! It's a piece of history, basically the cornerstone of cryptography.
Let's Try It Out With OWASP WrongSecrets
I found this great challenge to test various secret detection techniques and tricks at https://www.wrongsecrets.com/. Made by the great folks at OWASP.
It includes a comprehensive set of k8s techniques as well! To access those, try and get the minikube and vault running, the project README provides the instructions.
Highly recommended. Go, try it out! Much more fun than listening to me talking about it, get that first-hand experience. And if you get stuck, they have hints.
CTF Highlights
I enjoyed the k8s section, I think that poking around Vault and configmaps is a good skill to have in the toolset. And I discovered the kubeseal tool, definitely keeping that one in mind.
The ETH blockchain shenanigans were completely new to me - that was the place where I looked up the hints. If nothing else, I now know not to put anything sensitive on the blockchain.
Defensive Tools
I've got software, it has secrets, help! These tools are tailored for the defensive operations:
Yelp-secrets is a mature project designed to inspect git repositories, it's mostly tailored towards defensive usage. The killer feature is watching for the differences against a baseline.
Dockle is a linter for Docker images, it can help you catch the secrets in your image.
GitGuardian - SaaS based solution that barged into my GitHub account. Free up to 25 dev teams.
Canary Tokens - similar to honeypots, you’ll get a message when someone uses a secret generated with this service.
Using yelp-secrets
The recommended way of using this tool is via pre-commit hooks - it’s a check that happens before you commit your changes in git. Install pre-commit with:
$ cd your-repository/
$ pip install pre-commit
$ pre-commit install
Now include the configuration in the .pre-commit-config.yaml
file:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/Yelp/detect-secrets
rev: v1.5.0
hooks:
- id: detect-secrets
args: ['--baseline', '.secrets.baseline']
exclude: package.lock.json
And when you attempt to commit a secret that’s not in the baseline, you’re greeted with:
Offensive Tools
In the WrongSecrets, we are on the offensive. There are some specialist tools we can use:
Trufflehog - made for working with git repositories, and couple other services such as S3, Jenkins and others.
DeepFence SecretScanner - made for Docker images.
Dive - made for peeling the layers of Docker images, not exactly tailored for secret detection.
I reluctantly include a couple of tools for binary reversing - I think that activity is a whole different beast.
Ghidra - Reverse engineering toolset made by NSA that was made open-source. No further comments.
Radare2 - Another general reverse engineering toolset. AFAIK not made by NSA.
IlSpyCMD - Reversing C#, which is somewhat easier than reversing a generic binary.
JADX - Reversing Java binaries.
Just Look At It
There were a couple of tasks in the CTF that were easier for me by just using my eyes. Don't forget, you have a built-in neural network somewhat trained on detecting secrets in your very skull! Sadly, it doesn’t scale well.
I've consistently used this approach when figuring out the secrets in binaries. No fancy reversing (because I am bad at it), just your ol' trusty strings
tool to parse all legible strings of certain length.
Whenever you're dealing with a website, dev-tools (F12) are a godsend. Built-in each modern browser, you can:
Access the Javascript console to run arbitrary functions,
Check the HTML comments,
Inspect cookies and various storage,
Check network calls,
And many more.
And last but not least - never underestimate people’s ability to post secret data in the most ridiculous of places (like reddit).
Go, Detect Your Secrets!
I hope this overview gives you the tools and the confidence to try and reach out for your secrets. Hopefully you'll find, erase and rotate them before the attackers. Take care!
Git Guardian
To be honest, the tool looks really nice, it has all the bells and whistles:
I was rather surprised to receive this email as I didn’t sign for it - they just pro-actively scan all of GitHub and write you if they hit something up.
They’ve managed to email me in approx 5 minutes after pushing! Let this be a reminder that the attackers are doing the same and you shouldn’t commit your secrets to GitHub.
That’s why I think that the pre-commit hook is the place for secret detection rather than the pipeline job - by the time it hits a branch in a repository, it’s too late.