KTransformers: The Ultimate Inference on a Single Card
September 13
•
15:40 - 16:05
Location: Venue 6 - B01
KTransformers is a CPU-GPU heterogeneous inference framework that can perform inference on mainstream large models such as DeepSeekR1 and KimiK2 using a single card. It achieves separation of different computations by placing MoE layers on CPU and MLA on GPU, fully utilizing resources of different hardware. Additionally, KTransformers adopts the newly developed Expert Defer technology, which can fully leverage the advantages of CPU-GPU heterogeneous architecture and significantly improve performance. KTransformers has made extensive attempts on different hardware platforms and achieved good results.