Miloslav Homer




Deploying JA4


I want to identify/block/rate-limit bots coming to my server. I want to avoid vendor lock-in and use open-source solutions. I also want to avoid PCAP Inspection (Suricata, Zeek...) as those are threatened by Encrypted Client Hello (and are fiddly/expensive to setup/maintain).

JA4 fits these requirements. In the previous part, I’ve covered JA3/JA4 generally:

  • Both are TLS fingerprinting methods analyzing ClientHello parameters (cipher suites, extensions, ALPN...).
  • JA3 is now obsolete since Chrome permutes their extensions, so please use JA4 which is now well supported.
  • JA4+ (the full suite of methods) has a mixed licensing model, beware.

In this part, let’s deploy this tech. There’s a somewhat-maintained list of tools supporting JA4 to choose from. Keeping the requirements above, I’ve found only these options:

If you really want to dive deep, I’ve implemented JA4 calculation in Python using scapy to parse PCAPs captured by tcpdump as a PoC. It’s not production ready at all, but I've learned a ton about the technology. Parsing PCAPs is the hard part, calculating JA4 from the extracted data is much easier.

Still interested? Let’s get started with the gauntlet of standards and acronyms.

Pre-Requisites, Technologies, RFCs and Shorthands

I’m about to drown you in terminology as we need to juggle a lot of the standards to get this thing rolling. So here’s a brief explainer:

  • TLS (Transport Layer Security) RFC8446: the encryption layer for HTTP and many other protocols. There are many extensions to this.
  • SSL (Secure Socket Layer): previous version of TLS, quite compatible with TLS. Sometimes you still see references or the SSL/TLS shorthand for HTTPS.
  • Client Hello: First message from client to server to establish TLS.
  • SNI (Server Name Identification) RFC6066: the client includes this extension in the client hello to tell the server which backend to connect to.
  • ALPN (Application Layer Protocol Negotiation) RFC7443: client hello extension, it’s a hint to the server explaining what protocol to use after TLS is established.
  • GREASE (Generate Random Extensions to Sustain Extensibility) RFC8701: enables clients to troll machines with selected nonsense values at random places they have to handle correctly and not crash.
  • ECH (Encrypted Client Hello) RFC9849: new standard to encrypt that client hello to hide SNI and ALPN and others from prying eyes. My article on this.
  • PCAP (Packet Capture): format for storing packet data, de-facto standard established by libpcap implementation in the 80s. No formal RFC yet.

These points are really just a reference to keep as you navigate these shorthands and standards. And now let’s deploy some JA4.

The Outsourced Way

You can reach out to one of the many big service providers to do this for you. Here’s a list, pick your favorite.

Obviously, that’s not something I’d like to do for this blog, the whole purpose of which is to do something independent. Focusing on tech, I want to avoid vendor lock-in as much as possible.

For example Cloudflare gives you this only in the Enterprise plan, the best plan there is! While you can expect some price increases, if you already have these plans, it’s a no-brainer. Even upgrading the plan might be cheaper than switching providers and/or rolling your own.

The Self-Hosted Way

From the same list we can pick a few open source solutions that can support JA4.

A lot of these solutions capture pcaps to analyze the traffic (Suricata, Zeek, Wireshark, Arkime...). Beware if you’d like to use Encrypted Client Hello as the ClientHello properties you’d need will be... encrypted.

I’d rather deploy it on a reverse-proxy. There is an “official” nginx module from the JA4 creators, but from their descriptions it’s a work in progress. Since they’ve recently hidden their DB, I’m not sure what is the future of this extension.

There is also rama proxy, which looks amazing (new, rust, feature set) but has LLMisms all over the place. I’m not hating on the project, but I’m not using it either.

Speaking of rust-based proxies, pingora seems to be on its way to support JA4, so fingers crossed.

And Envoy seems to be already there, but it's not on the list. Go figure. If you have a favorite proxy/server, check their docs first, maybe they actually support it.

Eliminating the above is how I’ve picked the HAProxy JA4 plugin.

JA4 in HAProxy

I need haproxy, but at least 3.1 which is not by default in debian 13, so head here for instructions on how to include the repo. I need also a sample app with uvx fastapi-new hw. Of course, I need to do the standard stuff1.

Now let’s get the plugin:

cd /etc/haproxy
mkdir lua; cd lua
wget https://raw.githubusercontent.com/O-X-L/haproxy-ja4-fingerprint/refs/heads/latest/ja4.lua

We'll also need to set this up in the config file:

global
    ...
    # JA4 related stuff
    tune.ssl.capture-buffer-size 2048
    lua-load /etc/haproxy/lua/ja4.lua
    
frontend https-in
    bind *:443 ssl crt /etc/haproxy/certs/haproxy-test.local.pem
    default_backend hello_world
    http-request lua.fingerprint_ja4
    http-request capture var(txn.fingerprint_ja4) len 36
    http-request capture req.hdr(user-agent) len 150

backend hello_world
    server app1 127.0.0.1:8000 check

First we need to set these checks and TLS termination on the frontend as well as route the traffic to correct backend.

We need to tell haproxy to capture SSL/TLS info and load the plugin. Small note on the tune.ssl.capture-buffer-size 2048 - the extension author recommends to set this to 192, but as we’re trying to fit basically the entire ClientHello into here we can consider post-quantum ciphers like Kyber, which can have a signature that's 1568 bytes long!

Then we need to tell the front end to compute JA4 and capture it along with the User-Agent to logs.

Et voilà!

May 29 20:03:42 haproxy-test haproxy[3319]: [NOTICE]   (3319) : Loading success.
May 29 20:03:42 haproxy-test systemd[1]: Started haproxy.service - HAProxy Load Balancer.
May 29 20:06:18 haproxy-test haproxy[3322]: 192.168.100.1:46182 [29/May/2026:20:06:18.238] https-in~ hello_world/app1 0/0/0/1/1 404 153 - - ---- 1/1/0/0/0 0/0 {t13d1715h2_5b57614c22b0_7121afd63204|Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0} "GET https://haproxy-test.local/test HTTP/2.0"

You can see in the curly braces that we have both the JA4 and the User-Agent captured for further processing.

From here, you can do whatever. Capture the logs for later checks, forward them to some IDS/IPS, set headers for the application.. I'm sure your favourite LLM will generate a script to capture what you need - if not, here's my cronjob script that needs to run every hour.

But oh no, Lua is interpreted it’s so slow, single-threaded, oh no, this will kill performance... Well, I ran a simple test:

  • Client: ffuf (40 threads by default) with this wordlist.
  • Server: 2 vCPU, 2GB RAM VM.
  • Network: No network issues - localhost connections.
  • Once with JA4 collection disabled and once with JA4 collection enabled.
$ ffuf -u https://haproxy-test.local/FUZZ -w fav-wordlist.txt -mt ">100"
...
:: Progress: [90821/90821] :: Job [1/1] :: 3278 req/sec :: Duration: [0:00:40] :: Errors: 0 ::

$ ffuf -u https://haproxy-test.local/FUZZ -w fav-wordlist.txt -mt ">100"
...
:: Progress: [90821/90821] :: Job [1/1] :: 3225 req/sec :: Duration: [0:00:42] :: Errors: 0 ::

While it’s far from a rigorous approach, I think the difference is quite small (~5%) and your bottleneck will be the network speed.

The Fragile PoC Way

Guess which one I’ve picked. Doing everything by hand is a great way of exploring the limits of the technology. I’ve learned a ton, got my hands dirty and now I know what am I missing.

Since I didn’t want to commit to restructuring my site, I opted for collecting PCAPs, which is a standard format for capturing raw packets.

Obligatory warning: this is not intended nor built for production usage. It’s fragile, riddled with assumptions, doesn’t scale and has bugs. And don’t get me started on security - the author runs this as root!

Collecting PCAPs

One quick’n’dirty way can be implemented like so:

#!/bin/bash

IF=eth0
IP=$(ip address show dev $IF | grep inet | grep -o 'inet [0-9]*\.[0-9]*\.[0-9]*\.[0-9]*' | sed 's/inet //')

fname=$(date +%s).pcap
tcpdump -ni $IF "dst $IP and port 443 and (tcp[((tcp[12] & 0xf0) >> 2)] = 0x16)" -U -w /root/pcaps/$fname

If you’d like, wrap it in a systemd service and we can start collecting PCAPs right away.

The tcpdump filter targets only client hello packets, which have an ID we can set as filter. As an added benefit, we can collect a lot of logs without exhausting the disk space.

This approach cannot capture User-Agents. For those you’d need to do correlation against nginx access logs using timestamps and IP addresses. It’s unpleasant, but for a PoC I don’t need to do better.

Another downside of this approach is that you can’t process Encrypted Client Hello packets. This is how I fell into the ECH rabbit hole and came back with an article. Fortunately, I’m not using ECH, so I can get away with this.

Parsing PCAPs

Calculating JA4 hashes is not that hard if you follow the spec. Here’s roughly what you need, in the python dataclass form:

@dataclass
class JA4Packet:
    protocol: ConnProtocol
    tls_version: TLSVersion
    sni: SNI
    ciphers: list[CipherSuite] = field(default_factory=lambda: [])
    extensions: list[Extension] = field(default_factory=lambda: [])
    signature_algorithms: list[SignatureAlgorithm] = field(default_factory=lambda: [])
    alpn: ALPN = ALPN("")
    ech: bool = False

What’s problematic is extracting, validating and then getting that data into your algorithm.

I could write a series of articles on PCAP parsing as I go through all of the RFCs to implement this from the scratch. I’ve opted to use scapy instead. I’d also recommend finding and preparing a list of extension IDs for easier reference and code clarity (or at least some bits of it):

from scapy.all import *

# known extension ids list is your friend
# https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml
ext_ids = {
    "server_name": 0,
    "supported_groups": 10,
    "ec_point_formats": 11,
    "signature_algorithms": 13,
    "application_layer_protocol_negotiation": 16,  # ALPN
    "supported_versions": 43,
    "ech_outer_extensions": 64768,
    "encrypted_client_hello": 65037,
}

TLS Version Extraction

As an example to illustrate the technical complexity and historical baggage, let’s consider the issue of TLS version extraction. Sounds simple, right? Wrong:

def get_tls_version(packet):
    tlv = packet["TLS"].msg[0].version
    tlv_e = [
        x for x in packet["TLS"].msg[0].ext if x.type == ext_ids["supported_versions"]
    ]
    if len(tlv_e) > 0 and len(tlv_e[0].versions) > 0:
        tlv = max(tlv_e[0].versions)
    return tlv

First you search for the TLS version specified in the packet. Today, you’ll find three conflicting values:

  • TLS version of the TLS layer: TLS 1.0 (0x0301)
  • TLS version of the Client Hello (legacy TLS version): TLS 1.2 (0x0303)
  • TLS versions supported by the client in the “supported_versions” (since RFC8846)
3xtls
The various TLS versions as observed in Wireshark

The actual version is then the one negotiated with the server. That means I’m taking a shortcut here by assuming the server will always accept the highest available valid TLS version (omit the GREASE values, of course2).

Wiring Client Hello Properties

We need to get data from point A to point B. Let’s just roll up the sleeves and be done with it:

def packet_to_ja4packet(packet) -> JA4Packet:
    tlv = get_tls_version(packet)
    cps = packet["TLS"].msg[0].ciphers
    sni_l = [x for x in packet["TLS"].msg[0].ext if x.type == ext_ids["server_name"]]
    sni = SNI.sni() if len(sni_l) > 0 else SNI.no_sni()
    exs = [x.type for x in packet["TLS"].msg[0].ext]
    sig_sch_l = [
        x for x in packet["TLS"].msg[0].ext if x.type == ext_ids["signature_algorithms"]
    ]
    sig_algs = sig_sch_l[0].sig_algs if len(sig_sch_l)>0 else []
    alpn_l = [
        x
        for x in packet["TLS"].msg[0].ext
        if x.type == ext_ids["application_layer_protocol_negotiation"]
    ]
    alpn = alpn_l[0].protocols[0].protocol.decode("utf-8") if len(alpn_l)>0 and len(alpn_l[0].protocols) > 0 else "" # assuming it's only one ALPN extension and the first is used
    ech_out = [x for x in packet["TLS"].msg[0].ext if x.type == ext_ids["ech_outer_extensions"]]
    ech_in = [x for x in packet["TLS"].msg[0].ext if x.type == ext_ids["encrypted_client_hello"]]
    return JA4Packet(
        ConnProtocol.TCP(),  # this is an assumption
        TLSVersion.from_packet(tlv),
        sni,
        [CipherSuite.from_packet(x) for x in cps],
        [Extension.from_packet(x) for x in exs],
        [SignatureAlgorithm.from_packet(x) for x in sig_algs],
        ALPN.from_packet(alpn),
        len(ech_out)>0 or len(ech_in)>0,
    )

As you can see, this is mostly wiring - this value goes here, that value goes there.

There are some assumptions and shortcuts on assuming the ALPN usage, TCP connections (but I don’t implement QUIC or DTLS), no repeated extensions (forbidden by RFC8446, but still) and an ECH flag for good measure.

Validating Parsed Values

I need a validation layer filtering out the GREASE values and correctly formatting the rest of the values to be what JA4 expects. In these two classes I capture most of the validating logic:

class Int:
    def __init__(self, val):
        self.val = val

    @classmethod
    def from_packet(cls, val):
        if type(val) == int and val >= 0 and val <= 0xFF:
            return cls(val)
        elif type(val) != int:
            raise ValueError(f"Invalid type of {val}, need int")
        else:
            raise ValueError(f"{val} not between 0 and {0xFF}")

    def to_ja3(self):
        return self.val  # we need int

    def to_ja4(self):
        return self.val  # we need int

class DoubleInt:
    def __init__(self, val):
        self.val = val

    @classmethod
    def from_packet(cls, val):
        if type(val) == int and val >= 0 and val <= 0xFFFF:
            return cls(val)
        elif type(val) != int:
            raise ValueError(f"Invalid type of {val}, need int")
        else:
            raise ValueError(f"{val} not between 0 and {0xFFFF}")

    def to_ja3(self):
        return self.val  # we need int

    def to_ja4(self):
        return "{:04x}".format(self.val)  # we need 4 chars of hex

    # https://datatracker.ietf.org/doc/html/draft-davidben-tls-grease-00
    GREASE_LIST = [
        0x0A0A, 0x1A1A, 0x2A2A, 0x3A3A,
        0x4A4A, 0x5A5A, 0x6A6A, 0x7A7A,
        0x8A8A, 0x9A9A, 0xAAAA, 0xBABA,
        0xCACA, 0xDADA, 0xEAEA, 0xFAFA,
    ]

    def is_grease(self):
        return self.val in self.GREASE_LIST

Most of the values will fall into these two types. I've decided to just assign one of these two types to those parts to avoid GREASE code duplication. Strictly speaking, for ciphersuites we should also validate against a list, but this structure then let's me extend it later if needed. So here's a mapping + links:

# https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-4
CipherSuite = DoubleInt

# https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml#tls-extensiontype-values-1
Extension = DoubleInt

# https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-8
EllipticCurve = DoubleInt

# https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-9
EllipticCurvePointFormat = Int

# https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-signaturescheme
SignatureAlgorithm = DoubleInt

Another tricky bit is the ALPN mapping, where there is a custom function to reduce ALPN to 2 characters as well as validating it against a known list:

class ALPN:
    # https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml#alpn-protocol-ids

    known_alpns = [
        "",
        "http/0.9",
        "http/1.0",
# this goes on for a while ...
        "netperfmeter/control",
        "netperfmeter/data",
    ]

    def __init__(self, value):
        if type(value) == str:
            self.value = value
        else:
            raise ValueError("ALPN needs to be a str")  # maybe?

    @classmethod
    def from_packet(cls, value):
        return cls(value)

    def to_ja4(self):
        if self.value == "":
            return "00"
        elif not alnum(self.value[0]) or not alnum(self.value[-1]):
            return (
                "{:02x}".format(ord(self.value[0]))[0]
                + "{:02x}".format(ord(self.value[-1]))[-1]
            )
        else:
            return self.value[0] + self.value[-1]

There's also SNI, which is either "i" (IP address, no SNI) or "d" (domain, SNI). That one is simple to implement, but I need to mention it for completeness.

Bot Analysis - Next Time

This setup worked. For prod usage, I'd recommend using HAProxy or a cloud provider, but even collecting PCAPs might help your use-cases.

In the next part, I’ll finally look at some traffic coming my way and we’ll use the gathered data to identify some bots and trends.


1.

If this was way too fast, I’m really looking only for a single static endpoint to test this, so FastAPI sounds great. It’s python, so I’m at home. I can bootstrap it with uv and fastapi-new for a single endpoint on a dev server.

As I’m on a test machine, self signed certs are ok, so I can run e.g. openssl req -x509 -nodes -days 365 -newkey rsa:2048 \ -keyout /etc/haproxy/certs/haproxy-test.local.key \ -out /etc/haproxy/certs/haproxy-test.local.crt \ -subj "/CN=haproxy-test.local".

Get the pem using cat haproxy-test.local.crt haproxy-test.local.key > haproxy-test.local.pem and don’t forget to chmod 600 haproxy-test.local.pem.

Wrap it up in a simple systemd service and go. To use the haproxy-test.local domain, I can just modify /etc/hosts.

You know not to use self-signed certs in prod, right?

For added style points, you can run this on a QEMU virtual machine, like I did.

Back

2.

Defined in RFC8701, GREASE values are there to troll the machines. They need to be parsed correctly, but otherwise they should be ignored. This should motivate solution vendors to properly implement RFCs, otherwise they’ll encounter many stupid errors.

Back

Return to blog, Return to top