Debugging "tls: failed to verify certificate: x509: certificate signed by unknown authority"
And a few frustrated opinions.
TL;DR: I was trying to connect a Forgejo runner to my self-hosted instance. This is done by a Go binary and it contains a TLS verify step. That failed with:
ERRO Cannot ping the Forgejo instance server error="unavailable: tls: failed to verify certificate: x509: certificate signed by unknown authority"
The binary wasn’t able to check the full certificate chain. So I configured the server to send the full chain instead of a single certificate (yes, it’s the best practice). Fixed!
If that fixes your issue, I am happy to help. I’ve got some more opinions on the issue, so I’ll try and voice them in this article.
But first, let me walk you through the debugging process.
Self-Hosted Forgejo: Lightweight Software Repository
One day I stumbled upon the Forgejo project. I wanted to have a self-hosted code repository, and I deemed Gitlab an overkill for my needs. This project promised a light-weight alternative, so I decided to give it a go.
Setup was easy enough, I have a VPS where I host a couple of apps. In front of those is an Nginx reverse proxy. Add a user for the app, configure the subdomain, configure certificates, we’re happy.
It seems that the app is indeed lightweight:
I also wanted the runners - finally I would have an environment where I can play with various AppSec tools. The runner needs to be ran on another machine. I’ve used this as an excuse to play with Wireguard and I hooked up my tiny PC at home to the VPS. You need docker or something on the runner machine.
Debugging Runner Registration
Now, let’s connect those two, shall we? And there’s the error from the intro:
ERRO Cannot ping the Forgejo instance server error="unavailable: tls: failed to verify certificate: x509: certificate signed by unknown authority"
Step 1: Is The Certificate Valid?
Right, so… how do we figure out what’s up? Seems there’s a trust issue, so the first thing I did was to check if I have a valid certificate on my server. I have - browsers had no issues in verifying:
Step 2: Does The Client Trust The CA?
Ok, so maybe my client doesn’t know that I trust the Let’s Encrypt certificate authority. There are standard mechanisms in a system for this, let’s manually import those root CA certs.
It’s always a good day when you have to work with OpenSSL. For some reason, you will download PEM certificates, that you’ll convert to DER certificates using
openssl x509 -outform der -in your-cert.pem -out your-cert.crt
Copy those certs into the “/usr/local/share/ca-certificates” folder and then commit the updates by running “update-ca-certificates”. That converts it back to PEM format. Anyway, moving on… still the same error.
Step 3: Can I Ignore The Problem?
In lot of the cases, there are ways of explaining to the client that it shouldn’t care about validity of certificates. It’s usually a parameter of the client somewhere. Another case where this is appreciated is when you’re behind a “corporate proxy” with an internal CA that your client doesn’t recognize. For example because it doesn’t read the system certificate storage mentioned in step 2.
Anyway, I’ve checked the docs, I’ve checked the “—help” for both the command and the subcommand (did you know that these might be different?), no dice.
Step 4: What Am I Working With?
When searching for the error message I’ve noticed that it seems to be related to the Go language. I’ve checked my client and indeed it is written in Go - sources.
There were a lot of messages pointing to wrong directions or the things that I’ve already tried - e.g. this post explaining how I should include the trusted certificates in the system store. Or perhaps this post explaining how this is definitely not a Go problem.
Step 5: Banging My Head Against The Wall
Make no mistake, 80% of my time with this problem was spent in this step. By sheer luck and determination I’ve stumbled upon this entry discussing the importance of precise naming in certificates. Checked, it’s ok. But there was another comment below:
What I was also going to try (and apparently worked for this guy) was to get the service that I was trying to communicate with to modify their nginx conf so that they set
ssl_client_certificate
to not point to just a client cert, but a combined cert (one that includes the entire CA chain).
I’ve tried to access the referenced post and it didn’t work for me. Fortunately, it was archived at archive.org.
An additional development: I've discovered the root cause.
It would appear that the golang http stack isn't compatible with nginx's
ssl_trusted_certificate directive.
When used to indicate which client certificate to trust, curl will succeed
and the go example will fail.
Contrast with the gist I linked above, which uses: ssl_client_certificate
/example/client.crt
These two directives vary in subtle ways:
http://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_trusted_certificate
http://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_client_certificate
I’ve switched the cert to a full-chain cert and voila, fixed.
Go Standard Library Should Be Fixed
Now, this is not great - since Firefox has no issues in working with the old configurations it’s not a server misconfiguration. The issue is client side. I’ve modified the server so I could move forward, but this is not right.
I’ve got two candidates for the error - the net/http package and the crypto/x509 package. Based on this issue, the crypto package should have an API for the intermediaries. This should be the example.
So I dug deeper and I’ve found that this error message is associated with “UnknownAuthorityError”. What’s funny is that there’s literally one instance in the whole codebase where this error is created.
I think the error is somewhere in the tls handshake part. Where exactly is too much to ask of my go capabilities. I’d expect a capability to query for intermediate certificates. But I think I have enough for a good bug report.
A 10 Year Old Bug
Originally, I was prepared to deliver a flaming speech, bashing the irresponsible developers and dragging their names through the mud. What happened to standards? What happened to quality? Is this how the youngsters code? This wouldn’t happen in my days. And get off my lawn!
Then I’ve tried to find if this bug was already reported. I was greeted with over 9000 open issues on Github. Moreover, over 55k issues were closed. Needless to say, I didn’t find this exact issue on the list. A good reminder that we’re all human and hate gets us nowhere.
Suddenly it hit me. The article at archive.org is ten years old. I don’t think my configuration is extra special - just a normal Nginx with a Let’s Encrypt certificate. How did this issue survive for so long? Am I the only one with this problem? My frustration was instantly replaced with confusion. How did this happen?
Am I So Out of Touch?
I was hoping to lift my spirits by mocking the AI debugging capabilities, so I started prompting ChatGPT for help. And wouldn’t you know it - in five queries:
3. Verify the Server’s Certificate Chain
Since Firefox is working but the Go client is not, this can sometimes indicate an incomplete certificate chain. Even if Firefox accepts the certificate, the Go client might not be able to verify all the intermediate certificates.
Solution:
Use a tool like SSL Labs to check if the Forgejo server is presenting a full certificate chain. If it's incomplete, you may need to configure the server to provide all intermediate certificates.
Now I am properly devastated. ChatGPT did in 5 minutes what took me 2 hours. Although it is possible that it would take many more queries if I had no idea what any of this means. Maybe the lesson here is that LLMs are a great force multiplier if you have the basics down?
By this point I can hear the sysadmins laughing. So, I was wrong - the best practice is to include all intermediaries in the chain. Let’s check the SSL Labs benchmark for certs:
Here’s a more thorough treaty on the case - basically the differences between the X.509 RFC 5280 and the SSL/TLS RFC 5246. The view that client should do the work is not ruled out by RFC 5280 and the view that servers should provide full chains is in RFC 5246 appendix F 1.1.1. This is also a reminder that these RFCs are awesome and should be read whenever needed.
One Of Those Days
All is well that ends well. In this case, the runner is running:
I’ve learned something new. I’ve also exercised my debugging abilities. Yes, I could have spent the time differently, and now I get why a lot of people are worried about AI. It’s one of those maneuvers that will stick with you for time to come. And hey, maybe my post will help another poor soul staring at a cryptic certificate error. Take care!