Google DeepMind Releases Gemma 4 12B, Open-Source Multimodal Model Running on 16GB GPU Memory

Google DeepMind released Gemma 4 12B, an open-source multimodal AI model today. The 12-billion parameter model delivers performance comparable to its larger 26B Mixture of Experts model while requiring less than half the memory, and can run on consumer laptops with just 16GB of VRAM, including entry-level MacBook Air M5 devices.

Gemma 4 12B is the first mid-sized model in the Gemma 4 series to support native audio input. The model features a lightweight architecture without separate vision and audio encoders, enabling lower latency and reduced memory consumption. It supports multi-step reasoning, Agent workflows, and fully offline local inference. The model is released under Apache 2.0 license with pre-trained weights available on Hugging Face and Kaggle, and can be deployed via Google Cloud platforms including Model Garden, Cloud Run, and GKE.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments