Table of contents

Kernloom Throughput / XDP Performance / Benchmark

Real-world sizing

Rule of thumb: Kernloom is usually link-limited, not CPU-limited.

For many edge deployments, 1–2 dedicated CPU cores are enough to protect a 10GbE uplink; for 25GbE plan for 4–6 cores; for 100GbE plan for 12–20 cores (worst-case small packets).

Assumptions: XDP native/driver mode, per-IP token bucket + telemetry, RSS/queues configured, IRQ/core pinning.

Target link64B line-rate (worst-case)Realistic Kernloom coresRealistic Kernloom capacity (Mpps)
10GbE~14.88 Mpps2 cores~10–30 Mpps
25GbE~37.2 Mpps4–6 cores~20–90 Mpps
100GbE~148.8 Mpps12–20 cores~60–300 Mpps

Note: Many production workloads use larger packets and/or have mixed traffic, so CPU needs can be lower than the worst-case 64B flood sizing above.


Throughput reference table (estimates)

Stack / ModeCPU ReferenceThroughput (Mpps)Speedup vs L7 WAF (TLS+WAF)
Kernloom (XDP, per-IP token bucket + telemetry)1 core5–1525×–300×
Kernloom (XDP, per-IP token bucket + telemetry)4 cores20–60100×–1,200×
Kernloom (XDP, per-IP token bucket + telemetry)8 cores40–120200×–2,400×
Kernloom (XDP, per-IP token bucket + telemetry)16 cores80–240400×–4,800×
L7 WAF baseline (TLS termination + WAF)2 vCPU0.05–0.20

Assumption to express L7 in packets/s: ~10 packets per HTTP transaction (request + response).

Note: “vCPU” performance depends on hypervisor scheduling, CPU model, NUMA placement, and steal time. For fair comparisons, pin workloads to dedicated physical cores where possible.


Packet size reference (bytes/s and bit-rate)

RateSmall (64 B)Medium (512 B)Jumbo (9000 B)
5 Mpps320,000,000 B/s (2.56 Gbit/s)2,560,000,000 B/s (20.48 Gbit/s)45,000,000,000 B/s (360 Gbit/s)
10 Mpps640,000,000 B/s (5.12 Gbit/s)5,120,000,000 B/s (40.96 Gbit/s)90,000,000,000 B/s (720 Gbit/s)
15 Mpps960,000,000 B/s (7.68 Gbit/s)7,680,000,000 B/s (61.44 Gbit/s)135,000,000,000 B/s (1080 Gbit/s)

Larger packets don’t change Kernloom’s decision logic directly (it typically inspects only the first bytes), but larger frames increase DMA and memory bandwidth pressure. Depending on NIC/driver settings and map pressure, this can raise cache misses and the overall RAM footprint.

Important: the tables above use payload sizes (64/512/9000 B) for readability. “On-the-wire” throughput is slightly higher due to Ethernet/IP/TCP overhead and, for small packets, preamble + inter-frame gap.


Theoretical upper bound at 16 cores (CPU-limited pps assumption)

Packet size16 cores @ 80 Mpps16 cores @ 240 Mpps
64 B5.12e9 B/s (5.12 GB/s)1.536e10 B/s (15.36 GB/s)
512 B4.096e10 B/s (40.96 GB/s)1.2288e11 B/s (122.88 GB/s)
9000 B (jumbo)7.20e11 B/s (720 GB/s)2.16e12 B/s (2160 GB/s)

Reality check: for medium/jumbo packets you will almost always be bandwidth-limited (link speed, PCIe, memory bandwidth) long before you reach these byte/s figures. High Mpps values mainly matter for small-packet floods and scan traffic.


See also

ArchitectureThe XDP data path, pinned maps, and enforcement model
Shield referenceMap capacity limits and what fills at scale
Getting startedInstall and start protecting your first node