MiniMax Scans 200K Tokens, Discovers 4.9% Degradation in M2 Series Models

According to MiniMax’s technical blog, the company discovered significant token degradation in its M2 series models through a full vocabulary scan. Approximately 4.9% of the 200,000 tokens showed notable performance decline, with Japanese tokens hardest hit at 29.7%, compared to Korean (3.3%), Russian (3.7%), Chinese (3.9%), and English (3.5%). The degradation stems from low-frequency tokens being pushed into incorrect vector space directions during post-training, where high-frequency tokens like tool_call markers continuously update surrounding parameters.

MiniMax implemented a synthetic data fix using simple token repetition tasks to stabilize the entire vocabulary. Results were immediate: Russian characters mixed into Japanese responses dropped from 47% to 1%, and vector stability (cosine similarity) improved from a low of 0.329 to above 0.97 across all tokens.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments