Google DeepMind

Gemini family — multimodal models with massive context windows and hybrid reasoning.

Available models

Starting rate

59.7 RODI / 1M

Max context

1.0M

Provider website Provider docs

Lyria 3 Pro PreviewNewgoogle/lyria-3-pro-preview

visionaudiochatlong-contextpreview

Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz stereo audio from text prompts or from images. These models deliver structural coherence, including vocals, timed lyrics, and full instrumental arrangements. Lyria 3 Pro can generate full-length songs with verses, choruses, bridges.

Price

Pricing unavailable

Context1.0MSpeedDeepInput:textimageOutput:textaudioModel details

Lyria 3 Clip PreviewNewgoogle/lyria-3-clip-preview

visionaudiochatlong-contextpreview

30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz stereo audio from text prompts or from images. These models deliver structural coherence, including vocals, timed lyrics, and full instrumental arrangements. Lyria 3 Clip can generate short clips, loops, previews.

Price

Pricing unavailable

Context1.0MSpeedBalancedInput:textimageOutput:textaudioModel details

Veo 3.1google/veo-3-1

visionvideo

Google's state-of-the-art video generation model, built for maximum visual fidelity in final production cuts. Veo 3.1 generates high-quality 1080p video from text or image prompts with native synchronized audio — including dialogue, ambient effects, and background sound. Supports scene extension (up to 20 chained clips for 140+ second narratives), frames-to-video transitions between two images, vertical video for Shorts, and 4K upscaling.

Price

Per second238.5RODI/s~0.400 USD/s
With audio / s238.5RODI/s~0.400 USD/s
No audio / s119.3RODI/s~0.200 USD/s

View all rates

Context128KSpeedDeepInput:textimageOutput:videoModel details

Veo 3.1 Fastgoogle/veo-3-1-fast

visionvideo

Google's mid-tier video generation model balancing speed and quality. Veo 3.1 Fast generates high-quality video from text or image prompts with native synchronized audio, offering faster turnaround than Veo 3.1 at lower cost. Supports first-frame and last-frame conditioning, multiple resolutions and aspect ratios, and SynthID watermarking.

Price

Per second59.7RODI/s~0.100 USD/s
With audio / s59.7RODI/s~0.100 USD/s
No audio / s47.7RODI/s~0.0800 USD/s

View all rates

Context128KSpeedDeepInput:textimageOutput:videoModel details

Gemini 2.5 Flashgoogle/gemini-2-5-flash

visionchatlong-context

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).

Price

In178.9RODI/M~0.300 USD/M
Out1490.2RODI/M~2.50 USD/M
Cached17.9RODI/M~0.0300 USD/M

View all rates

Context1.0MSpeedFastInput:documentimagetextaudiovideoOutput:textModel details

Gemini 2.5 Flash Litegoogle/gemini-2-5-flash-lite

visionchatlong-context

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off c

Price

In59.7RODI/M~0.100 USD/M
Out238.5RODI/M~0.400 USD/M
Cached6.0RODI/M~0.0100 USD/M

View all rates

Context1.0MSpeedFastInput:textimagedocumentaudiovideoOutput:textModel details

Gemini 2.5 Progoogle/gemini-2-5-pro

visionchatlong-context

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.

Price

In745.1RODI/M~1.25 USD/M
Out5960.8RODI/M~10.00 USD/M
Cached74.6RODI/M~0.125 USD/M

View all rates

Context1.0MSpeedFastInput:textimagedocumentaudiovideoOutput:textModel details

Gemini 3.1 Flash Litegoogle/gemini-3-1-flash-lite

visionchatlong-context

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic workflows, simple data extraction, and applications where responsiveness and API cost are the primary constraints. Supports full thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash.

Price

In149.1RODI/M~0.250 USD/M
Out894.2RODI/M~1.50 USD/M
Cached15.0RODI/M~0.0250 USD/M

View all rates

Context1.0MSpeedFastInput:textimagevideodocumentaudioOutput:textModel details

Gemini 3.1 Flash TTS PreviewNewgoogle/gemini-3-1-flash-tts-preview

audiopreview

Gemini 3.1 Flash TTS Preview is a text-to-speech model from Google, and a substantial generational step up from Gemini 2.5 Flash TTS. It takes text input and produces audio output across 70+ languages — nearly 3× the language coverage of its predecessor. The headline addition is a system of 200+ inline audio tags (e.g. `[whispers]`, `[laughs]`, `[excited]`) that let developers steer delivery, emotion, and pacing mid-sentence, alongside a "director's chair" workflow in Google AI Studio for defin

Price

In596.1RODI/M~1.00 USD/M
Out11921.6RODI/M~20.00 USD/M

Context8KSpeedFastInput:textOutput:audioModel details

Gemini 3.1 Pro PreviewNewgoogle/gemini-3-1-pro-preview

visionchatlong-contextpreview

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation of the Gemini 3 series, it combines high-precision reasoning across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-ca

Price

In1192.2RODI/M~2.00 USD/M
Out7153.0RODI/M~12.00 USD/M
Cached119.3RODI/M~0.200 USD/M

View all rates

Context1.0MSpeedFastInput:audiodocumentimagetextvideoOutput:textModel details

Gemini 3 Flash PreviewNewgoogle/gemini-3-flash-preview

visionchatlong-contextpreview

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants, making it well suited for interactive development, long running agent loops, and collaborative coding tasks. Compared to Gemini 2.5 Flash, it provides broad quality improvements across reasoning, multimodal understanding, and reliability.

Price

In298.1RODI/M~0.500 USD/M
Out1788.3RODI/M~3.00 USD/M
Cached29.9RODI/M~0.0500 USD/M

View all rates

Context1.0MSpeedFastInput:textimagedocumentaudiovideoOutput:textModel details

Nano Banana 2 (Gemini 3.1 Flash Image Preview)Newgoogle/gemini-3-1-flash-image-preview

visionimage-genpreview

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced contextual understanding with fast, cost-efficient inference, making complex image generation and iterative edits significantly more accessible. Aspect ratios can be controlled with the [image_config API Parameter](https://openrouter.ai/docs/features/multimodal/image-generation#image-aspect-ratio-c

Price

In298.1RODI/M~0.500 USD/M
Out1788.3RODI/M~3.00 USD/M
Image out35761.5RODI/M~60.00 USD/M

Context131KSpeedFastInput:imagetextOutput:imagetextModel details

Nano Banana (Gemini 2.5 Flash Image)google/gemini-2-5-flash-image

visionimage-gen

Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Aspect ratios can be controlled with the [image_config API Parameter](https://openrouter.ai/docs/features/multimodal/image-generation#image-aspect-ratio-configuration)

Price

In178.9RODI/M~0.300 USD/M
Out1490.2RODI/M~2.50 USD/M
Cached17.9RODI/M~0.0300 USD/M

View all rates

Context33KSpeedFastInput:imagetextOutput:imagetextModel details

Nano Banana Pro (Gemini 3 Pro Image Preview)Newgoogle/gemini-3-pro-image-preview

visionimage-genpreview

Nano Banana Pro is Google’s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly improved multimodal reasoning, real-world grounding, and high-fidelity visual synthesis. The model generates context-rich graphics, from infographics and diagrams to cinematic composites, and can incorporate real-time information via Search grounding. It offers industry-leading text rendering in images (including long passages and multilingu

Price

In1192.2RODI/M~2.00 USD/M
Out7153.0RODI/M~12.00 USD/M
Cached119.3RODI/M~0.200 USD/M

View all rates

Context66KSpeedFastInput:imagetextOutput:imagetextModel details

Veo 3.1 Litegoogle/veo-3-1-lite

visionvideo

Google's most cost-effective video generation model, designed for high-volume applications and rapid iteration. Veo 3.1 Lite generates 720p and 1080p video from text or image prompts with native synchronized audio at less than 50% of the cost of Veo 3.1 Fast. Supports 4–8 second clips in landscape (16:9) and portrait (9:16) formats, with SynthID watermarking. Ideal for content platforms, short-form video creation, and automated media generation.

Price

Per second29.9RODI/s~0.0500 USD/s
With audio / s29.9RODI/s~0.0500 USD/s
No audio / s17.9RODI/s~0.0300 USD/s

View all rates

Context128KSpeedDeepInput:textimageOutput:videoModel details