Gemini 3.1 Flash TTS Preview

google/gemini-3-1-flash-tts-preview

audiopreview

Input price

596.1 RODI/M

~ 1 USD/M

Output price

11921.6 RODI/M

~ 20 USD/M

Context

Max output

—

Input:text

Output:audio

Pricing

Rate	RODI	USD (ref.)	Unit
In	596.1	~ 1.00	USD/M · RODI/M
Out	11921.6	~ 20.00	USD/M · RODI/M

RODI prices include Rodium markup and upstream fees. USD figures are wholesale reference rates.

Capabilities

Streaming

Tool calling

Vision

JSON mode

Reasoning

About this model

Gemini 3.1 Flash TTS Preview is a text-to-speech model from Google, and a substantial generational step up from Gemini 2.5 Flash TTS. It takes text input and produces audio output across 70+ languages — nearly 3× the language coverage of its predecessor. The headline addition is a system of 200+ inline audio tags (e.g. `[whispers]`, `[laughs]`, `[excited]`) that let developers steer delivery, emotion, and pacing mid-sentence, alongside a "director's chair" workflow in Google AI Studio for defin

API usage

Use the canonical model slug in your chat completion requests.

…

Shell / scripts:

…

Chat completions docs →

Back to all models All Google models