I wanted to learn more about secret detection. Maybe my notes will help somebody else. In no way, shape, or form I claim to come up with any great idea in this article by myself, if I forgot some citation, please let me know.
First, let's go over the basics, then we can try them out in the WrongSecrets OWASP project.
Putting my mathematician hat away for a moment, I can now use circular definitions.
You also shouldn't make your secrets public. Obviously, if the attacker has your secrets then he can access your system. That's bad, that's what you want to avoid.
Examples of secrets usually are: passwords, private keys, tokens, DB connection strings, webhooks and many more.
We have data and we want to extract secrets from them.
Either we are on the offensive - then we want to separate the passwords and keys from the landfill. Or we are on the defensive - finding out if we are not leaking some secrets so we can fix the leak. Either way, the main goal of secret detection is to... detect secrets.
To feed the SEO - you can try to point AI to your data in hope of identifying secrets. It might even work. Consider that you are sharing your secrets with a third party that was able to make sense of the internet. I wouldn't recommend this approach. Here's what to do instead:
The sad part is, that there's no regex or other hard rule that would identify all of the secrets. What's a secret and what are data depends on the context of the data usage.
Examples:
go.sum
file.pw = "TODO: Integrate with Vault tomorrow"
is actually a secret, because the developer went on a sick leave and now it's the password to the production DB.You will face false-positives. Please keep that in mind and don't trust the tools blindly. Case in point, I created a “plugin” for a tool to mark Okta tokens and I’ve received this fancy email (opinion at1):
What the tools can do for you to identify simple context:
password
, pw
, pass
, private_key
.-----BEGIN PRIVATE KEY-----
).Another approach that is used has to do with randomness - entropy. High entropy strings in text are most likely secrets. The formula can be found as theorem 2 in the seminal paper A Mathematical Theory of Communication (C. E. Shannon, 1948).
I’ve got plenty of notes on this topic as well, but that won’t fit in this overview, so I’ve written it here in a separate post.
Definitely go read the paper as well! It's a piece of history, basically the cornerstone of cryptography.
I found this great challenge to test various secret detection techniques and tricks at https://www.wrongsecrets.com/. Made by the great folks at OWASP.
It includes a comprehensive set of k8s techniques as well! To access those, try and get the minikube and vault running, the project README provides the instructions.
Highly recommended. Go, try it out! Much more fun than listening to me talking about it, get that first-hand experience. And if you get stuck, they have hints.
I enjoyed the k8s section, I think that poking around Vault and configmaps is a good skill to have in the toolset. And I discovered the kubeseal tool, definitely keeping that one in mind.
The ETH blockchain shenanigans were completely new to me - that was the place where I looked up the hints. If nothing else, I now know not to put anything sensitive on the blockchain.
I've got software, it has secrets, help! These tools are tailored for the defensive operations:
The recommended way of using this tool is via pre-commit hooks - it’s a check that happens before you commit your changes in git. Install pre-commit with:
$ cd your-repository/
$ pip install pre-commit
$ pre-commit install
Now include the configuration in the .pre-commit-config.yaml
file:
.pre-commit-config.yaml
# .pre-commit-config.yaml
repos:
- repo: https://github.com/Yelp/detect-secrets
rev: v1.5.0
hooks:
- id: detect-secrets
args: ['--baseline', '.secrets.baseline']
exclude: package.lock.json
And when you attempt to commit a secret that’s not in the baseline, you’re greeted with:
In the WrongSecrets, we are on the offensive. There are some specialist tools we can use:
I reluctantly include a couple of tools for binary reversing - I think that activity is a whole different beast.
There were a couple of tasks in the CTF that were easier for me by just using my eyes. Don't forget, you have a built-in neural network somewhat trained on detecting secrets in your very skull! Sadly, it doesn’t scale well.
I've consistently used this approach when figuring out the secrets in binaries. No fancy reversing (because I am bad at it), just your ol' trusty strings
tool to parse all legible strings of certain length.
Whenever you're dealing with a website, dev-tools (F12) are a godsend. Built-in each modern browser, you can:
And last but not least - never underestimate people’s ability to post secret data in the most ridiculous of places (like reddit).
I hope this overview gives you the tools and the confidence to try and reach out for your secrets. Hopefully you'll find, erase and rotate them before the attackers do. Take care!
Git Guardian
To be honest, the tool looks really nice, it has all the bells and whistles:
I was rather surprised to receive this email as I didn’t sign for it - they just pro-actively scan all of GitHub and write you if they hit something up.
They’ve managed to email me in approx 5 minutes after pushing! Let this be a reminder that the attackers are doing the same and you shouldn’t commit your secrets to GitHub.
That’s why I think that the pre-commit hook is the place for secret detection rather than the pipeline job - by the time it hits a branch in a repository, it’s too late.