Implement Flash Attention Back End in SGLang – Basics and KV Cache

(hebiao064.github.io)

26 points | by latchkey 11 hours ago ago

2 comments

  • behnamoh 3 hours ago

    is sglang an LLM engine or does it use vLLM/llama.cpp under the hood? and while we're at it, has anyone done a comparison of LLM engines? I've also heard of Mistral.rs, LLM MLC, and obviously HF transformers library and its ktransformers alternative.

    • imtringued 28 minutes ago

      SGLang is a competitor to vLLM.