Baseten Unveils Still KV Cache Compression Method Achieving 200x Compression Rate

According to Beating, Baseten research team unveiled Still, a KV cache compression method that achieves up to 200x compression in a single forward pass without online optimization or gradient updates. Still integrates lightweight Perceiver compressors—sized at approximately 1% of the base model parameters—into each Transformer layer, applying cross-attention to the full KV cache to generate compressed cache directly. Tested on Qwen and Gemma models across 8k to 64k context windows at 8x to 200x compression rates, Still maintained high accuracy while outperforming comparable methods like SnapKV, H2O, and KV-Distill on the RULER benchmark.
Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments