Kernloom Throughput

  • Home /
  • Kernloom Throughput
Table of contents

Kernloom Throughput / XDP Performance / Benchmark

Real-world sizing

Rule of thumb: Kernloom is usually link-limited, not CPU-limited.

For many edge deployments, 1–2 dedicated CPU cores are enough to protect a 10GbE uplink; for 25GbE plan for 4–6 cores; for 100GbE plan for 12–20 cores (worst-case small packets).

Assumptions: XDP native/driver mode, per-IP token bucket + telemetry, RSS/queues configured, IRQ/core pinning.

Target link64B line-rate (worst-case)Realistic Kernloom coresRealistic Kernloom capacity (Mpps)
10GbE~14.88 Mpps2 cores~10–30 Mpps
25GbE~37.2 Mpps4–6 cores~20–90 Mpps
100GbE~148.8 Mpps12–20 cores~60–300 Mpps

Note: Many production workloads use larger packets and/or have mixed traffic, so CPU needs can be lower than the worst-case 64B flood sizing above.


Throughput reference table (estimates)

Stack / ModeCPU ReferenceThroughput (Mpps)Speedup vs L7 WAF (TLS+WAF)
Kernloom (XDP, per-IP token bucket + telemetry)1 core5–1525×–300×
Kernloom (XDP, per-IP token bucket + telemetry)4 cores20–60100×–1,200×
Kernloom (XDP, per-IP token bucket + telemetry)8 cores40–120200×–2,400×
Kernloom (XDP, per-IP token bucket + telemetry)16 cores80–240400×–4,800×
L7 WAF baseline (TLS termination + WAF)2 vCPU0.05–0.20

Assumption to express L7 in packets/s: ~10 packets per HTTP transaction (request + response).

Note: “vCPU” performance depends on hypervisor scheduling, CPU model, NUMA placement, and steal time. For fair comparisons, pin workloads to dedicated physical cores where possible.


Packet size reference (bytes/s and bit-rate)

| Rate | Small (64 B) | Medium (512 B) | Jumbo (9000 B) | |---:|---:|---:|---:| | 5 Mpps | 320,000,000 B/s (2.56 Gbit/s) | 2,560,000,000 B/s (20.48 Gbit/s) | 45,000,000,000 B/s (360 Gbit/s) | | 10 Mpps | 640,000,000 B/s (5.12 Gbit/s) | 5,120,000,000 B/s (40.96 Gbit/s) | 90,000,000,000 B/s (720 Gbit/s) | | 15 Mpps | 960,000,000 B/s (7.68 Gbit/s) | 7,680,000,000 B/s (61.44 Gbit/s) | 135,000,000,000 B/s (1080 Gbit/s) |

Larger packets don’t change Kernloom’s decision logic directly (it typically inspects only the first bytes), but larger frames increase DMA and memory bandwidth pressure. Depending on NIC/driver settings and map pressure, this can raise cache misses and the overall RAM footprint.

Important: the tables above use payload sizes (64/512/9000 B) for readability. “On-the-wire” throughput is slightly higher due to Ethernet/IP/TCP overhead and, for small packets, preamble + inter-frame gap.


Theoretical upper bound at 16 cores (CPU-limited pps assumption)

| Packet size | 16 cores @ 80 Mpps | 16 cores @ 240 Mpps | |---:|---:|---:| | 64 B | 5.12e9 B/s (5.12 GB/s) | 1.536e10 B/s (15.36 GB/s) | | 512 B | 4.096e10 B/s (40.96 GB/s) | 1.2288e11 B/s (122.88 GB/s) | | 9000 B (jumbo) | 7.20e11 B/s (720 GB/s) | 2.16e12 B/s (2160 GB/s) |

Reality check: for medium/jumbo packets you will almost always be bandwidth-limited (link speed, PCIe, memory bandwidth) long before you reach these byte/s figures. High Mpps values mainly matter for small-packet floods and scan traffic.