Kernloom Throughput and XDP Performance (Mpps vs L7 WAF)

Table of contents

Kernloom Throughput / XDP Performance / Benchmark

Real-world sizing

Rule of thumb: Kernloom is usually link-limited, not CPU-limited.

For many edge deployments, 1–2 dedicated CPU cores are enough to protect a 10GbE uplink; for 25GbE plan for 4–6 cores; for 100GbE plan for 12–20 cores (worst-case small packets).

Assumptions: XDP native/driver mode, per-IP token bucket + telemetry, RSS/queues configured, IRQ/core pinning.

Target link	64B line-rate (worst-case)	Realistic Kernloom cores	Realistic Kernloom capacity (Mpps)
10GbE	~14.88 Mpps	2 cores	~10–30 Mpps
25GbE	~37.2 Mpps	4–6 cores	~20–90 Mpps
100GbE	~148.8 Mpps	12–20 cores	~60–300 Mpps

Note: Many production workloads use larger packets and/or have mixed traffic, so CPU needs can be lower than the worst-case 64B flood sizing above.

Throughput reference table (estimates)

Stack / Mode	CPU Reference	Throughput (Mpps)	Speedup vs L7 WAF (TLS+WAF)
Kernloom (XDP, per-IP token bucket + telemetry)	1 core	5–15	25×–300×
Kernloom (XDP, per-IP token bucket + telemetry)	4 cores	20–60	100×–1,200×
Kernloom (XDP, per-IP token bucket + telemetry)	8 cores	40–120	200×–2,400×
Kernloom (XDP, per-IP token bucket + telemetry)	16 cores	80–240	400×–4,800×
L7 WAF baseline (TLS termination + WAF)	2 vCPU	0.05–0.20	1×

Assumption to express L7 in packets/s: ~10 packets per HTTP transaction (request + response).

Note: “vCPU” performance depends on hypervisor scheduling, CPU model, NUMA placement, and steal time. For fair comparisons, pin workloads to dedicated physical cores where possible.

Packet size reference (bytes/s and bit-rate)

Rate	Small (64 B)	Medium (512 B)	Jumbo (9000 B)
5 Mpps	320,000,000 B/s (2.56 Gbit/s)	2,560,000,000 B/s (20.48 Gbit/s)	45,000,000,000 B/s (360 Gbit/s)
10 Mpps	640,000,000 B/s (5.12 Gbit/s)	5,120,000,000 B/s (40.96 Gbit/s)	90,000,000,000 B/s (720 Gbit/s)
15 Mpps	960,000,000 B/s (7.68 Gbit/s)	7,680,000,000 B/s (61.44 Gbit/s)	135,000,000,000 B/s (1080 Gbit/s)

Larger packets don’t change Kernloom’s decision logic directly (it typically inspects only the first bytes), but larger frames increase DMA and memory bandwidth pressure. Depending on NIC/driver settings and map pressure, this can raise cache misses and the overall RAM footprint.

Important: the tables above use payload sizes (64/512/9000 B) for readability. “On-the-wire” throughput is slightly higher due to Ethernet/IP/TCP overhead and, for small packets, preamble + inter-frame gap.

Theoretical upper bound at 16 cores (CPU-limited pps assumption)

Packet size	16 cores @ 80 Mpps	16 cores @ 240 Mpps
64 B	5.12e9 B/s (5.12 GB/s)	1.536e10 B/s (15.36 GB/s)
512 B	4.096e10 B/s (40.96 GB/s)	1.2288e11 B/s (122.88 GB/s)
9000 B (jumbo)	7.20e11 B/s (720 GB/s)	2.16e12 B/s (2160 GB/s)

Reality check: for medium/jumbo packets you will almost always be bandwidth-limited (link speed, PCIe, memory bandwidth) long before you reach these byte/s figures. High Mpps values mainly matter for small-packet floods and scan traffic.


Architecture	The XDP data path, pinned maps, and enforcement model
Shield reference	Map capacity limits and what fills at scale
Getting started	Install and start protecting your first node

Kernloom Throughput / XDP Performance / Benchmark

Real-world sizing

Throughput reference table (estimates)

Packet size reference (bytes/s and bit-rate)

Theoretical upper bound at 16 cores (CPU-limited pps assumption)

See also