SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving
September 14
•
10:15 - 10:50
Location: Venue 3 - 268
SGLang is an efficient open-source framework for large-scale LLM serving. Over the past year, SGLang has experienced rapid iteration and significant advancements. This presentation presents an overview of SGLang's leading features, including KV Cache Reuse, Zero-overhead Batch Scheduling, Speculative Decoding, Prefill/Decode Disaggregation, and Large-scale Expert Parallelism.