Skip to content
Home

Pricing

Model API pricing depends on the model type and invocation specifications. Multimodal models are usually billed by input specification, output specification, or generation unit, while some image models and large language models may be billed by token. The model detail page is the source of truth for the exact rules.

Large language models are usually billed by token. The basic price items are input price and output price. Some models also provide cache read and cache write prices to distinguish cache-hit reads from cache writes.

The same model may also split input or output prices by context length, or split cache write prices by cache duration. For example, short-context and long-context usage may have different prices, and 5-minute cache writes may differ from 1-hour cache writes.

Price itemBilling unitPricing factors
Input priceTokenInput token count, context length
Output priceTokenOutput token count, context length
Cache read priceTokenCache-hit token count, context length
Cache write priceTokenWritten token count, cache duration

Multimodal pricing is not tied to a single parameter. Image models are usually billed per image, with prices varying by resolution, generation mode, output count, and similar specifications. Video models are usually billed per second, though some are billed per video, with prices varying by duration, resolution, generation mode, and whether audio is generated.

Some Google or GPT image models use token-based pricing. For example, Nano Banana and GPT Image 2 typically split text input, image input, and image output into separate price items.

Model typeBilling unitPricing factors
Image modelsPer imageResolution, mode, output count
Video modelsSeconds or per videoResolution, mode, duration, whether audio is included
Audio modelsSeconds, character count, or per callInput length, output duration, voice or mode
3D modelsPer callOutput format, quality tier, generation mode
Special image modelsTokenText input, image input, text output, image output