Rodium AI
Google

Gemini 3.1 Flash TTS Preview

google/gemini-3-1-flash-tts-preview

audiopreview

Input price

596.1 RODI/M

~ 1 USD/M

Output price

11921.6 RODI/M

~ 20 USD/M

Context

8K

Max output

Input:text
Output:audio

Pricing

RateRODIUSD (ref.)Unit
In596.1~ 1.00USD/M · RODI/M
Out11921.6~ 20.00USD/M · RODI/M

RODI prices include Rodium markup and upstream fees. USD figures are wholesale reference rates.

Capabilities

Streaming
Tool calling
Vision
JSON mode
Reasoning

About this model

Gemini 3.1 Flash TTS Preview is a text-to-speech model from Google, and a substantial generational step up from Gemini 2.5 Flash TTS. It takes text input and produces audio output across 70+ languages — nearly 3× the language coverage of its predecessor. The headline addition is a system of 200+ inline audio tags (e.g. `[whispers]`, `[laughs]`, `[excited]`) that let developers steer delivery, emotion, and pacing mid-sentence, alongside a "director's chair" workflow in Google AI Studio for defin

API usage

Use the canonical model slug in your chat completion requests.

Shell / scripts:

Chat completions docs →