According to Abacus.AI CEO Bindu Reddy, Google plans to unveil Gemini 3.2 Flash during its I/O conference on May 20, with performance reaching 92% of GPT-5.5 on coding and reasoning tasks while cutting inference costs to just one-fifteenth to one-twentieth of the latter. Most queries will have latency below 200 milliseconds. Reddy attributed the breakthrough to Google’s distillation and sparsity techniques, which compress a frontier model into the Flash tier without the typical performance cliff typically seen in model optimization.
Related News
Experts Say Zk Proofs Give DePINs an Edge as AI Trust Demands Rise
Google launches its first “AI laptop”: Googlebook deeply integrates Gemini to become the best collaboration partner
Google: Large language models are being used for real-world attacks; AI can bypass dual-factor authentication security mechanisms