Alpha-MoE: A Megakernel for Faster Tensor Parallel Inference

Mixture of Experts (MoE) architectures are reshaping the landscape of large language models, offering efficiency gains that dense models can’t match. But these benefits come with a cost: complex communication patterns that make performance optimization a real challenge.
That’s why we built Alpha-MoE, a fused megakernel library designed for FP8 W8A8 precision (8-bit weights, 8-bit activations). By combining multiple operations into a single persistent kernel, Alpha-MoE delivers up to 200% speed improvements compared to current Triton kernels in open-source LLM serving frameworks like vLLM and SGLang.
Want to understand how this works and what it means for real-world inference performance? Download the full report here to explore the architecture, benchmarks and practical insights behind Alpha-MoE.