Google DeepMind released Gemma 4 12B, an open-source multimodal AI model today. The 12-billion parameter model delivers performance comparable to its larger 26B Mixture of Experts model while requiring less than half the memory, and can run on consumer laptops with just 16GB of VRAM, including entry-level MacBook Air M5 devices.
Gemma 4 12B is the first mid-sized model in the Gemma 4 series to support native audio input. The model features a lightweight architecture without separate vision and audio encoders, enabling lower latency and reduced memory consumption. It supports multi-step reasoning, Agent workflows, and fully offline local inference. The model is released under Apache 2.0 license with pre-trained weights available on Hugging Face and Kaggle, and can be deployed via Google Cloud platforms including Model Garden, Cloud Run, and GKE.