Table of contents

Kernloom IQ (kliq) β€” Full reference

Kernloom IQ is the local Policy Decision Point (PDP). It reads telemetry from Shield, scores each source, and applies progressive enforcement through the active PEP adapter.

This page covers:


Decision engine

Per-tick inputs

Every tick (--interval, default 1s), IQ reads per-source deltas from Shield’s telemetry maps:

  • PPS β€” packets per second
  • SYN/s β€” TCP SYN packets per second
  • scan/s β€” distinct destination ports contacted per second (port-scan signal)
  • DropRL/s β€” packets dropped by Shield’s rate limiter per second

Severity score

Each signal is normalised by a trigger threshold and capped:

nPPS  = min(PPS   / trig-pps,  sev-cap)
nSYN  = min(SYN/s / trig-syn,  sev-cap)
nSCAN = min(scan/s/ trig-scan, sev-cap)

severity = w-pps Γ— nPPS + w-syn Γ— nSYN + w-scan Γ— nSCAN

Strikes β†’ levels

Severity is converted to strikes using step thresholds (--sev-step*, --sev-delta*). Strike counts are mapped to enforcement levels:

Threshold flagLevel reached
--soft-atRATE_SOFT
--hard-atRATE_HARD
--block-atBLOCK (subject to block gate)

Hysteresis

IQ prevents rapid oscillation using:

  • --up-need / --down-need β€” consecutive high/low ticks required to change level
  • --min-hold-soft / --min-hold-hard β€” minimum time in each level
  • --cooldown β€” minimum time between any level changes

Non-compliance escalation

If a source is in RATE_HARD and continues producing DropRL/s (it is hitting the limiter and not backing off), IQ accelerates escalation to BLOCK. Controlled by --noncomp-at, --noncomp-drop, --noncomp-sev.

Block gate

Blocking behind NAT is risky (one bad source can block many legitimate users). Gate it:

--block-min-sev 3.0     # only block if severity >= 3.0 ...
--block-min-dur 60s     # ... for at least 60 seconds

If the gate fails, IQ stays at HARD instead of BLOCK.


Configuration

IQ has two independent configuration axes in v0.2.0.

PDPConfig (--pdp-config) β€” what to measure

A PDPConfig YAML file is the primary configuration mechanism. It controls signal engine thresholds, the autotune schedule, progressive enforcement parameters, and adapter parameters. It replaces the older --profile flag for most use cases.

--profile still works as a shorthand but the PDPConfig approach is recommended β€” it is Forge-compatible and gives full control over the bootstrap schedule.

16 PDPConfig files ship in configs/pdp/ (8 bootstrap + 8 production):

Bootstrap profileProduction profileRole
ziti-controller-bootstrapziti-controllerPublic Ziti controller
ziti-router-bootstrapziti-routerPublic Ziti router
web-server-bootstrapweb-serverPublic web server
reverse-proxy-bootstrapreverse-proxyReverse proxy
idp-bootstrapidpIdentity provider
database-bootstrapdatabaseDatabase server
api-server-bootstrapapi-serverInternal API
nas-bootstrapnasNAS / storage

Public-facing profiles (ziti-controller, ziti-router, web-server, reverse-proxy) have graph.enabled: false β€” graph learning is not useful when clients are unknown internet IPs and the flow map fills up immediately. Internal profiles (idp, database, api-server, nas) have graph.enabled: true.

Bootstrap profiles start with blocking disabled (block-at=999) and use max_down: 0.10 (10%/hour) so thresholds converge from the cold-start value within 48h. Switch to the production profile once you have observed stable operation.

Feature profile (--feature-profile) β€” which subsystems are active

A feature profile controls which IQ subsystems run. It is auto-derived from the --graph flag when not set explicitly:

ProfileRequiresWhat runs
klshield-lightklshield only, no kliqXDP + static deny/allow. No learning.
dos-lightklshield + kliqSource heuristic + autotune. No graph, no SQLite.
iq-learningklshield + kliqdos-light + per-source EWMA baseline.
graph-learningklshield + kliqiq-learning + flow telemetry + graph + SQLite.
graph-enforceklshield + kliqgraph-learning + XDP tuple enforcement.

Check which subsystems are active at runtime:

kliq runtime status graph-learning
kliq runtime status klshield-light   # β†’ explains no kliq needed

Legacy profile values

For reference: the older --profile flag seeds these initial values (all adapted by autotune):

SoftRate and HardRate are in packets per second. BlockAt=999 means blocking is effectively disabled.

ProfileTrigPPSTrigSynTrigScanSoftAtHardAtBlockAtSoftRateHardRate
ziti-router8 0002003025123 000800
ziti-controller80205139205
ziti-router-bootstrap25 000600120389996 0001 500
ziti-controller-bootstrap40012030269996020
public-web1 200250202512500120
public-api2 5005003024101 000300
idp350180101385010
internal-app80015083699920050
ssh-bastion6025512651
1. Copy the right PDPConfig for your node to /opt/kernloom/attested/etc/pdp/node.yaml
2. Start with --dry-run=true and --whitelist-learn=true
3. Let IQ observe and autotune on real traffic for 7–14 days
4. Review state transitions; add whitelist entries for known-good sources
5. Enable enforcement: --dry-run=false

Graph Learner

The Graph Learner is an optional module that builds a baseline of observed communication paths and can enforce Zero Trust once the baseline is frozen.

Enable with --graph. Control the mode with --graph-mode.

Modes

ModeWhat happens
learnRecord source→destination flows as graph edges. No enforcement on unknown paths.
frozen-observeBaseline is frozen. Unknown edges inject extra FSM strikes and emit signals, but enforcement is still gradual. Good for catching false positives before going strict.
frozen-enforceUnknown edges force the source immediately to BLOCK, bypassing normal strike accumulation. Strict Zero Trust posture.

Edge lifecycle

candidate  β†’  learned  β†’  frozen
                 β”‚
              approved (manual)
              denied   (manual, never overwritten)
StateMeaning
candidateSeen, evidence building (count, distinct time windows, age)
learnedPromoted: evidence criteria met
frozenLocked into the baseline
approvedManually confirmed β€” carries extra trust weight
deniedExplicitly blocked β€” never promoted, never overwritten

Suspicious sources (currently RATE_SOFT, RATE_HARD, or BLOCK) are automatically excluded from the baseline to prevent polluting it with attack traffic.

Workflow

Step 1 β€” learn your baseline:

sudo /opt/kernloom/attested/kliq \
  --pdp-config=/opt/kernloom/attested/etc/pdp/idp-bootstrap.yaml \
  --graph --graph-mode=learn \
  --dry-run=true --whitelist-learn=true

Run for several days to a week to capture representative traffic patterns.

Step 2 β€” review and clean up:

kliq graph export                       # export all edges
kliq graph export --sort=state          # grouped by state
kliq graph edges --sort=state           # overview with state counts
kliq graph baselines --sort=obs         # per-edge EWMA stats (PPS/BPS peaks)
kliq graph approve-ip <ip>              # mark an edge as explicitly approved
kliq graph deny-ip <ip>                 # mark an edge as denied

Step 3 β€” check readiness before freezing:

kliq graph freeze --dry-run

Reports how many edges would be frozen, how many candidates are still immature, how many low-confidence edges exist. Does not write anything.

Step 4 β€” freeze the baseline:

kliq graph freeze

This locks all learned edges to frozen. New traffic after this point will be compared against the baseline.

Step 5 β€” observe before enforcing:

sudo /opt/kernloom/attested/kliq --graph --graph-mode=frozen-observe

Run for a few days. Watch for unexpected signals (legitimate sources you forgot to include). Add them with approve-ip if needed.

Step 6 β€” full enforcement:

sudo /opt/kernloom/attested/kliq --graph --graph-mode=frozen-enforce

Any source taking a path not in the frozen baseline is immediately forced to BLOCK.

Independence from behaviour-based enforcement

The graph enforces path-based Zero Trust only. A source with a known, frozen edge is still fully subject to IQ’s severity scoring.

If a known node starts sending a SYN flood, scanning ports, or generating unusual packet volume, progressive enforcement applies normally β€” the graph has no say in this. IQ asks “is this behaviour acceptable?” independently of whether the graph knows the path.

This matters in practice: a compromised workload may deliberately stay on known communication paths to avoid triggering the graph, but its traffic patterns will still be anomalous to IQ. The two mechanisms catch different things and neither exempts a source from the other.

The only exception is the IQ whitelist (--whitelist): sources explicitly whitelisted in IQ are exempt from all enforcement. Graph approval is not the same as an IQ whitelist entry β€” they are separate and independent.

Storage

The graph is stored in a unified SQLite database (kliq.db, default path: /var/lib/kernloom/iq/kliq.db). Since v0.2.0 this database also contains source baseline data and edge baseline EWMA stats in separate tables. It persists across restarts independently of state.json.


New in v0.2.0: learning improvements

Source baseline

Active when feature-profile β‰₯ iq-learning. IQ tracks per-source IP EWMA statistics separately. A known high-traffic source gets an effective trigger of max(global_trigger, source_peak Γ— 1.2), so a source that normally sends 250 PPS does not trip a global trigger of 100 PPS. Unknown sources fall back to the global trigger.

Edge baseline improvements

  • Two-phase EWMA alpha: bootstrap alpha=0.10 while observations < 30, stable alpha=0.02 after. The baseline converges quickly but resists being moved by short spikes.
  • Decaying peak (peak_decay_half_life): a single historical spike no longer permanently defines the ceiling. A peak from 14 days ago is worth 50% of its original value (peak_decay_half_life: "336h").

Anti-poisoning (three layers)

Three layers prevent attacks from corrupting baselines:

  1. TrigPPS cap β€” observations above the host-level trigger are never written
  2. SuspiciousRegistry β€” source AND edge-level suspicious state tracked separately; a freeze violation on one edge no longer blocks learning for all edges from that source
  3. 30s pending buffer β€” baseline updates are delayed 30s and dropped if the source or edge was flagged in that window

kliq graph baselines

kliq graph baselines [--all] [--sort=obs|state|src|port|pps|bps]
kliq graph baselines reset

Shows per-edge EWMA stats including PPS_PEAK and BPS_PEAK columns.


Autotune

IQ can learn your trigger thresholds (trig-pps, trig-syn, trig-scan) from observed traffic using Median + MAD statistics.

Learning only happens on clean ticks β€” ticks where:

  • the fraction of high-severity sources is below --learn-frac-gt
  • no source is in BLOCK (if --learn-skip-if-blocks=true)
  • global drop ratio is below --learn-max-drop-ratio

This prevents attack traffic from poisoning the baseline.

Bootstrap schedule

The bootstrap schedule runs autotune more aggressively for the first ~14 days, then slows down to a steady-state interval. It has three phases with decreasing update rates and increasing conservatism.

PhaseDurationAutotune interval
Phase 10 β†’ 48h1h
Phase 248h β†’ 5d6h
Phase 35d β†’ 14d24h
Steady-stateafter 14d84h

State is saved to /var/lib/kernloom/iq/state.json and reloaded on restart, so the schedule survives process restarts.

Bug fix (v0.2.0): Versions before v0.2.0 had a bug where the autotune could get stuck on quiet nodes and never apply updates (the timer reset on every skip, creating a permanent loop). This is fixed in v0.2.0. Additionally, bootstrap phase 1 previously used max_down: 0.02 (2%/hour), meaning triggers could take 70+ hours to converge from the cold-start value. The new PDPConfig profiles use max_down: 0.10 (10%/hour) so convergence happens within 48h.

Troubleshooting: If autotune triggers are not moving after 2–3 revisions, delete /var/lib/kernloom/iq/state.json and restart IQ.


Exemptions

Whitelist (permanent)

Sources in the whitelist are never scored, never enforced. Add IPs, IPv6 addresses, or CIDRs β€” one per line:

# /opt/kernloom/attested/etc/whitelist.txt
203.0.113.7
203.0.113.0/24
2001:db8::1

Reloaded automatically every --whitelist-reload (default 10s).

Feedback (temporary)

Use feedback for time-bound exemptions without permanently whitelisting:

[
  {"target":"203.0.113.7","action":"forgive","ttl":"24h","notes":"partner NAT"},
  {"target":"198.51.100.0/24","action":"whitelist","until":"2026-06-01T00:00:00Z"}
]

Reloaded every --feedback-reload (default 10s). Prefer until over ttl for stable expiry across restarts.


CLI flag reference

Core runtime

FlagTypeDefaultNotes
--intervalduration1sPoll and decision tick
--topint200Evaluate top-N sources per tick
--min-ppsfloat10Skip sources below this PPS
--min-sevfloat0Include candidates with severity β‰₯ this
--dry-runbooltrueNever write enforcement maps

Profile and persistence

FlagTypeDefaultNotes
--pdp-configstring``Path to PDPConfig YAML (recommended over –profile)
--feature-profilestring``Override active subsystems: dos-light, iq-learning, graph-learning, graph-enforce
--profilestringcontrollerLegacy seed profile. Aliases: router→ziti-router, controller→ziti-controller, internal→internal-app
--state-filestring/var/lib/kernloom/iq/state.jsonPersist tuned thresholds. Empty disables.
--max-state-ageduration336hIgnore persisted state older than this
--state-historyint30Keep last N history entries

Graph Learner

FlagTypeDefaultNotes
--graphboolfalseEnable the graph learner
--graph-modestringlearnOne of: learn, frozen-observe, frozen-enforce
--graph-dbstring/var/lib/kernloom/iq/kliq.dbUnified SQLite database path (graph edges + baselines)

Whitelist

FlagTypeDefaultNotes
--whiteliststring/opt/kernloom/attested/etc/whitelist.txtIPv4/IPv6/CIDR, one per line
--whitelist-reloadduration10sAuto-reload interval (0 disables)
--whitelist-learnboolfalseAllow whitelisted sources to contribute to learning

Feedback

FlagTypeDefaultNotes
--feedback-filestring/var/lib/kernloom/iq/feedback.jsonJSON array of temporary exemptions
--feedback-reloadduration10sAuto-reload interval
--feedback-learnboolfalseAllow feedback-exempt sources in learning
--feedback-deenforce-cidrbooltrueActively scan and remove RL/deny entries for CIDR feedback
--feedback-cidr-everyduration30sCIDR de-enforcement scan interval
--feedback-cidr-maxint5000Max map deletions per scan

Bootstrap

FlagTypeDefaultNotes
--bootstrapbooltrueEnable bootstrap autotune schedule
--bootstrap-windowduration336hTotal bootstrap duration
--bootstrap-phase1-endduration48hEnd of phase 1
--bootstrap-phase2-endduration120hEnd of phase 2
--bootstrap-every1/2/3duration1h/6h/24hAutotune interval per phase
--steady-everyduration84hPost-bootstrap autotune interval
--bootstrap-k-startfloat4.0k at start (higher = fewer false positives)
--bootstrap-k-finalfloat3.5k at bootstrap end
--bootstrap-allow-blockboolfalseAllow BLOCK during bootstrap. Default: cap at RATE_HARD to protect against bad learning
--bootstrap-min-windowsint0Minimum completed autotune cycles before allowing downscale (0 = disabled)

Autotune

FlagTypeDefaultNotes
--autotunebooltrueEnable threshold learning
--autotune-kfloat3.5k for median + kΓ—MAD
--autotune-min-samplesint5000Minimum clean samples before applying
--autotune-max-changefloat0.05Max relative change per update (Β±5%)
--autotune-alphafloat0.2Smoothing factor (0 disables)
--autotune-floor-ppsfloat100Minimum trig-pps
--autotune-floor-synfloat50Minimum trig-syn
--autotune-floor-scanfloat20Minimum trig-scan

Clean tick gates (anti-poison)

FlagTypeDefaultNotes
--learn-sev-gtfloat1.0Severity threshold for “dirty” source
--learn-frac-gtfloat0.005Max fraction of dirty sources for a clean tick
--learn-max-sevfloat0.8Only learn from sources with sev ≀ this
--learn-skip-if-blocksbooltrueSkip learning if any IP is in BLOCK
--learn-max-drop-ratiofloat0.02Skip if global drop ratio exceeds this

Severity model

FlagTypeDefaultNotes
--trig-ppsfloat0PPS trigger (0 β†’ profile / state)
--trig-synfloat0SYN/s trigger
--trig-scanfloat0scan/s trigger
--w-ppsfloat0PPS weight (0 β†’ profile)
--w-synfloat0SYN weight
--w-scanfloat0scan weight
--sev-capfloat0Normalisation cap (0 β†’ profile)

Strike mapping

FlagTypeDefaultNotes
--sev-step1/2/3float1.0/2.0/3.0Severity thresholds for delta1/2/3
--sev-delta1/2/3int1/2/3Strikes added at each step
--sev-decay-belowfloat0.25Allow strike decay below this severity

Level thresholds

FlagTypeDefaultNotes
--soft-atint0Strikes β‰₯ this β†’ RATE_SOFT (0 β†’ profile)
--hard-atint0Strikes β‰₯ this β†’ RATE_HARD
--block-atint0Strikes β‰₯ this β†’ BLOCK

Enforcement actions

FlagTypeDefaultNotes
--soft-rateuint640SOFT rate limit in PPS (0 β†’ profile)
--soft-burstuint640SOFT burst tokens
--soft-ttlduration0SOFT level TTL
--hard-rateuint640HARD rate limit in PPS
--hard-burstuint640HARD burst tokens
--hard-ttlduration0HARD level TTL
--block-ttlduration0BLOCK TTL (0 β†’ profile)
--cooldownduration0Minimum time between level changes

Block gate

FlagTypeDefaultNotes
--block-min-sevfloatNaNMinimum severity to allow BLOCK (NaN β†’ profile; 0 disables)
--block-min-durduration-1Severity must be sustained for this long (-1 β†’ profile; 0 disables)

Hysteresis

FlagTypeDefaultNotes
--up-needint0Consecutive high ticks before escalating (0 β†’ profile)
--down-needint0Consecutive low ticks before stepping down
--min-hold-softduration0Minimum time in SOFT before stepping down
--min-hold-hardduration0Minimum time in HARD before stepping down

Non-compliance escalation

FlagTypeDefaultNotes
--noncomp-atint0Non-compliance ticks before accelerating to BLOCK (0 β†’ profile)
--noncomp-dropfloat0DropRL/s threshold for non-compliance tick
--noncomp-sevfloat0Severity threshold for non-compliance tick
--noncomp-reset-belowfloat0Reset counter when sev < this and DropRL/s = 0

Housekeeping

FlagTypeDefaultNotes
--prev-ttlduration10mForget delta snapshot if source not seen
--state-ttlduration60mForget OBSERVE-only state if not seen

See also

Getting startedInstall, bootstrap, and go from dry-run to enforcement
Shield referenceXDP commands, tuple enforcement, pinned maps
ArchitectureHow IQ and Shield fit into the PDP/PEP model
Integration PatternsReal-world PDPConfig choices per node type
Operationssystemd units, log interpretation, troubleshooting