vLLM v0.23.0: Model Runner V2, Multi-Tier KV Offloading, and the Growing Rust Frontend
The vLLM v0.23.0 release landed last week with 408 commits from 200 contributors, and it packs several changes that directly
The vLLM v0.23.0 release landed last week with 408 commits from 200 contributors, and it packs several changes that directly
The dominant scaling narrative in large language models has been straightforward: more parameters, more data, more compute. But there’s a
Continue readingLoopCoder-v2: Why Two Loops Beat Four in Test-Time Compute Scaling
The open-source LLM landscape just got a new heavyweight contender. Z.ai (Zhipu AI) released GLM-5.2, a 753B-parameter mixture-of-experts model that
Continue readingGLM-5.2: The New #1 Open-Weight LLM and Why IndexShare Matters
The attention mechanism is the backbone of every transformer model, but it carries a brutal cost: quadratic complexity with respect
Continue readingHow MiniMax Sparse Attention Achieves 28x Compute Reduction at 1M Context Length
The GitHub trending page this week is dominated by AI agent tooling, but tucked between the skills and plugins are
Microsoft’s Build 2026 conference delivered a move that had been anticipated for months but still landed with weight: the company
AI agents have a skill problem. You give a language model a system prompt — or “skill” — and it
Continue readingSkillOpt: Training AI Agent Skills Like Neural Networks
AI coding agents are getting scary good at writing functional code. Give them a loose description and they’ll spin up
Continue readingConstraint Decay: Why Your AI Agent Forgets the Rules (and What to Do About It)
Every week, GitHub’s trending chart reveals where developer energy is heading. This week, the signal is unmistakable: the ecosystem is
Qwen just dropped Qwen3.7-Max, and it’s not another incremental chatbot upgrade. This model is purpose-built for something different: being an
Continue readingQwen3.7-Max: Built for the Agent Era, Not the Chat Era