vLLM Semantic Router Blog | vLLM Semantic Router

vLLM Semantic Router + Milvus: How Semantic Routing and Caching Build Scalable AI Systems the Smart Way

October 30, 2025 · 9 min read

Milvus Ambassador

Most AI apps rely on a single model for every request. But that approach quickly runs into limits. Large models are powerful yet expensive, even when they're used for simple queries. Smaller models are cheaper and faster but can't handle complex reasoning. When traffic surges—say your AI app suddenly goes viral with ten million users overnight—the inefficiency of this one-model-for-all setup becomes painfully apparent. Latency spikes, GPU bills explode, and the model that ran fine yesterday starts gasping for air.

Semantic Router Q4 2025 Roadmap: Journey to Iris

October 20, 2025 · 15 min read

Xunzhuo Liu

Software Engineer @ Tencent

Huamin Chen

Distinguished Engineer @ Red Hat

Chen Wang

Senior Staff Research Scientist @ IBM

Yue Zhu

Staff Research Scientist @ IBM

As we approach the end of 2025, we're excited to share our Q4 2025 roadmap for vLLM Semantic Router. This quarter marks a significant milestone in our project's evolution as we prepare for our first major release: v0.1, codename "Iris", expected in late 2025 to early 2026.

iris

vLLM Semantic Router: Next Phase in LLM inference

September 6, 2025 · 5 min read