Efficient Practices for LLM Inference with SGLang on Ascend
September 14
•
16:15 - 16:50
Location: Venue 3 - 268
The Huawei Ascend architecture differs significantly from NVIDIA GPU architecture, including differences in chip architecture, interconnect, software stack, programming system, and operator libraries, which brings many challenges for SGLang on Ascend adaptation and support. This presentation introduces the Ascend system architecture and shares the journey and efficient practices of SGLang on Ascend adaptation. We explore the technical hurdles overcome, performance optimizations achieved, and lessons learned in porting SGLang to the Ascend ecosystem.