DSpark can make decoding faster, but acceptance quality still determines how much speed the system actually realizes.
DeepSeek speculative decoding framework DSpark went live June 27 on V4-Flash and V4-Pro, reporting up to 85 percent faster ...
a mobile phone's screen showing the logo of Chinese AI Zhipu in Beijing on January 21, 2026. Investor confidence in Chinese AI startups is riding high, but obstacles to their long-term success range ...
It allows engineering teams to host frontier-level AI on their own sovereign infrastructure, entirely eliminating vendor lock ...
Chinese AI lab Zhipu AI releases GLM-5.2 with a stable 1-million-token context under the MIT license. On hours-long coding tasks, the open-source model trails Anthropic's Opus models by just a few ...
Most people know Xiaomi for phones and scooters. Not for breaking AI inference records. That changes today. Working with inference partner TileRT, Xiaomi has hit over 1,000 tokens per second on a ...
🎉 2026-02-14 · v0.1.3 Released. The v0.1.3 release introduces full support for the latest GLM-5 model, achieving up to 500 tokens/s on GLM-5-FP8 and up to 600 tokens/s on DeepSeek-V3.2. TileRT is a ...