
When Your LLM Trips the MMU: Page Faults, TLB Shootdowns, and the Hidden Virtual-Memory Tax of AI Inference
A distinguished-architect deep dive into GPU virtual memory internals, MMU fault pipelines, TLB shootdown mechanics, page-table walks, Unified Memory/HMM coherence, ATS, and why page migration turns your p99 into a hardware problem nobody on the team budgeted for.

