Ollama - User-friendly wrapper around llama.cpp with model management vLLM - High-throughput serving for production deployments ExLlamaV2 - Fastest single-user NVIDIA inference, EXL2 format MLX - ...