Filter

ws-sglang

Efficient Practices for LLM Inference with SGLang on Ascend

September 14

16:15 - 16:50

Location: Venue 3 - 268

The Huawei Ascend architecture differs significantly from NVIDIA GPU architecture, including differences in chip architecture, interconnect, software stack, programming system, and operator libraries, which brings many challenges for SGLang on Ascend adaptation and support. This presentation introduces the Ascend system architecture and shares the journey and efficient practices of SGLang on Ascend adaptation. We explore the technical hurdles overcome, performance optimizations achieved, and lessons learned in porting SGLang to the Ascend ecosystem.

Speakers