Google Deploys Multi-Token Prediction on Pixel 9 and 10, Boosting Gemini Nano Inference Speed Over 50%

According to Beating, Google deployed Multi-Token Prediction (MTP) architecture on Pixel 9 and Pixel 10 devices, significantly accelerating the on-device Gemini Nano v3 model. The new architecture increased inference speed by over 50% while preserving the model's safety alignment and output quality.

The zero-copy mechanism allows the prediction head to directly reuse the main model's cached features through cross-attention, eliminating the separate key-value cache overhead of traditional draft models. This design saved approximately 130MB of memory while reducing startup latency. In real-world applications like notification summarization and smart replies, MTP achieved a 55% increase in token acceptance rate, reducing processor wake-up frequency and lowering system power consumption.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments