Table of contents
Kernloom Throughput / XDP Performance / Benchmark
Real-world sizing
Rule of thumb: Kernloom is usually link-limited, not CPU-limited.
For many edge deployments, 1–2 dedicated CPU cores are enough to protect a 10GbE uplink; for 25GbE plan for 4–6 cores; for 100GbE plan for 12–20 cores (worst-case small packets).
Assumptions: XDP native/driver mode, per-IP token bucket + telemetry, RSS/queues configured, IRQ/core pinning.
| Target link | 64B line-rate (worst-case) | Realistic Kernloom cores | Realistic Kernloom capacity (Mpps) |
|---|---|---|---|
| 10GbE | ~14.88 Mpps | 2 cores | ~10–30 Mpps |
| 25GbE | ~37.2 Mpps | 4–6 cores | ~20–90 Mpps |
| 100GbE | ~148.8 Mpps | 12–20 cores | ~60–300 Mpps |
Note: Many production workloads use larger packets and/or have mixed traffic, so CPU needs can be lower than the worst-case 64B flood sizing above.
Throughput reference table (estimates)
| Stack / Mode | CPU Reference | Throughput (Mpps) | Speedup vs L7 WAF (TLS+WAF) | |
|---|---|---|---|---|
| Kernloom (XDP, per-IP token bucket + telemetry) | 1 core | 5–15 | 25×–300× | |
| Kernloom (XDP, per-IP token bucket + telemetry) | 4 cores | 20–60 | 100×–1,200× | |
| Kernloom (XDP, per-IP token bucket + telemetry) | 8 cores | 40–120 | 200×–2,400× | |
| Kernloom (XDP, per-IP token bucket + telemetry) | 16 cores | 80–240 | 400×–4,800× | |
| L7 WAF baseline (TLS termination + WAF) | 2 vCPU | 0.05–0.20 | 1× |
Assumption to express L7 in packets/s: ~10 packets per HTTP transaction (request + response).
Note: “vCPU” performance depends on hypervisor scheduling, CPU model, NUMA placement, and steal time. For fair comparisons, pin workloads to dedicated physical cores where possible.
Packet size reference (bytes/s and bit-rate)
| Rate | Small (64 B) | Medium (512 B) | Jumbo (9000 B) |
|---|---|---|---|
| 5 Mpps | 320,000,000 B/s (2.56 Gbit/s) | 2,560,000,000 B/s (20.48 Gbit/s) | 45,000,000,000 B/s (360 Gbit/s) |
| 10 Mpps | 640,000,000 B/s (5.12 Gbit/s) | 5,120,000,000 B/s (40.96 Gbit/s) | 90,000,000,000 B/s (720 Gbit/s) |
| 15 Mpps | 960,000,000 B/s (7.68 Gbit/s) | 7,680,000,000 B/s (61.44 Gbit/s) | 135,000,000,000 B/s (1080 Gbit/s) |
Larger packets don’t change Kernloom’s decision logic directly (it typically inspects only the first bytes), but larger frames increase DMA and memory bandwidth pressure. Depending on NIC/driver settings and map pressure, this can raise cache misses and the overall RAM footprint.
Important: the tables above use payload sizes (64/512/9000 B) for readability. “On-the-wire” throughput is slightly higher due to Ethernet/IP/TCP overhead and, for small packets, preamble + inter-frame gap.
Theoretical upper bound at 16 cores (CPU-limited pps assumption)
| Packet size | 16 cores @ 80 Mpps | 16 cores @ 240 Mpps |
|---|---|---|
| 64 B | 5.12e9 B/s (5.12 GB/s) | 1.536e10 B/s (15.36 GB/s) |
| 512 B | 4.096e10 B/s (40.96 GB/s) | 1.2288e11 B/s (122.88 GB/s) |
| 9000 B (jumbo) | 7.20e11 B/s (720 GB/s) | 2.16e12 B/s (2160 GB/s) |
Reality check: for medium/jumbo packets you will almost always be bandwidth-limited (link speed, PCIe, memory bandwidth) long before you reach these byte/s figures. High Mpps values mainly matter for small-packet floods and scan traffic.
See also
| Architecture | The XDP data path, pinned maps, and enforcement model |
| Shield reference | Map capacity limits and what fills at scale |
| Getting started | Install and start protecting your first node |