ws-sglang

SGLang Prefill/Decode Disaggregation with Mooncake

September 14

•

10:50 - 11:25

Location: Venue 3 - 268

Large Language Model (LLM) inference comprises two distinct phases: Prefill and Decode. The Prefill phase is computation-intensive, processing the entire input sequence, while the Decode phase is memory-intensive, managing the Key-Value (KV) cache for token generation. Traditionally, these phases are handled within a unified engine, where combined scheduling of prefill and decode batches introduces inefficiencies. To address these challenges, we introduce Prefill and Decoding (PD) Disaggregation in SGLang, which enables tailored optimizations for each.

Speakers

Shangming Cai

Senior Engineer, Alibaba Cloud