Table of contents
Kernloom Throughput / XDP Performance / Benchmark
Real-world sizing
Rule of thumb: Kernloom is usually link-limited, not CPU-limited.
For many edge deployments, 1–2 dedicated CPU cores are enough to protect a 10GbE uplink; for 25GbE plan for 4–6 cores; for 100GbE plan for 12–20 cores (worst-case small packets).
Assumptions: XDP native/driver mode, per-IP token bucket + telemetry, RSS/queues configured, IRQ/core pinning.
| Target link | 64B line-rate (worst-case) | Realistic Kernloom cores | Realistic Kernloom capacity (Mpps) |
|---|---|---|---|
| 10GbE | ~14.88 Mpps | 2 cores | ~10–30 Mpps |
| 25GbE | ~37.2 Mpps | 4–6 cores | ~20–90 Mpps |
| 100GbE | ~148.8 Mpps | 12–20 cores | ~60–300 Mpps |
Note: Many production workloads use larger packets and/or have mixed traffic, so CPU needs can be lower than the worst-case 64B flood sizing above.
Throughput reference table (estimates)
| Stack / Mode | CPU Reference | Throughput (Mpps) | Speedup vs L7 WAF (TLS+WAF) | |
|---|---|---|---|---|
| Kernloom (XDP, per-IP token bucket + telemetry) | 1 core | 5–15 | 25×–300× | |
| Kernloom (XDP, per-IP token bucket + telemetry) | 4 cores | 20–60 | 100×–1,200× | |
| Kernloom (XDP, per-IP token bucket + telemetry) | 8 cores | 40–120 | 200×–2,400× | |
| Kernloom (XDP, per-IP token bucket + telemetry) | 16 cores | 80–240 | 400×–4,800× | |
| L7 WAF baseline (TLS termination + WAF) | 2 vCPU | 0.05–0.20 | 1× |
Assumption to express L7 in packets/s: ~10 packets per HTTP transaction (request + response).
Note: “vCPU” performance depends on hypervisor scheduling, CPU model, NUMA placement, and steal time. For fair comparisons, pin workloads to dedicated physical cores where possible.
Packet size reference (bytes/s and bit-rate)
Larger packets don’t change Kernloom’s decision logic directly (it typically inspects only the first bytes), but larger frames increase DMA and memory bandwidth pressure. Depending on NIC/driver settings and map pressure, this can raise cache misses and the overall RAM footprint.
Important: the tables above use payload sizes (64/512/9000 B) for readability. “On-the-wire” throughput is slightly higher due to Ethernet/IP/TCP overhead and, for small packets, preamble + inter-frame gap.
Theoretical upper bound at 16 cores (CPU-limited pps assumption)
Reality check: for medium/jumbo packets you will almost always be bandwidth-limited (link speed, PCIe, memory bandwidth) long before you reach these byte/s figures. High Mpps values mainly matter for small-packet floods and scan traffic.