Kernel Tuning
PREEMPT_RT kernel + CPU isolation for inference threads, api-server and cluster-agent pinned to dedicated cores, fallible I/O paths moved to io_uring. Measurable wins on tail latency.
- PREEMPT_RT — bounded scheduling latency under load.
- isolcpus — cores 4–7 reserved for inference; kernel housekeeping cannot steal them.
- io_uring — batched submission queues for KV cache spill + log flush.
- Target: cut p99 enqueue jitter from ~8ms to <1ms on saturated nodes.