A curated hub for on-device AI
Google’s AI Edge Gallery is built on LiteRT (formerly TensorFlow Lite) and MediaPipe, optimized for running AI on resource-constrained devices. It supports open-source models from Hugging Face, including Google’s Gemma 3n — a small, multimodal language model that handles text and images, with audio and video support in the pipeline.
The 529MB Gemma 3 1B model delivers up to 2,585 tokens per second during prefill inference on mobile GPUs, enabling sub-second tasks like text generation and image analysis. Models run fully offline using CPUs, GPUs, or NPUs, preserving data privacy.
The app includes a Prompt Lab for single-turn tasks such as summarization, code generation, and image queries, with templates and tunable settings (e.g., temperature, top-k). The RAG library lets models reference local documents or images without fine-tuning, while a Function Calling library enables automation with API calls or form filling.