Large language model (LLM) API pricing is based on tokens — sub-word units roughly equivalent to 4 characters or 0.75 words in English. As of 2026, prices range from $0.07/million tokens (Gemini 2.0 Flash input) to $75/million tokens (GPT-4.5 output), a 1,000x spread. Input tokens (your prompt) are typically 2-10x cheaper than output tokens (the model's response) because autoregressive generation requires significantly more computation per token than encoding.
How Token Pricing Works
LLM APIs charge per token, with separate rates for input (your prompt) and output (the model's response). A token is roughly 4 characters in English. Prices are expressed per million tokens. The total cost of a request depends on three factors: input token count, output token count, and the model's per-token rates.
How do you choose the right LLM model for your budget?
For high-volume, low-complexity tasks (classification, extraction, simple Q&A), budget models like GPT-4o mini, Gemini Flash, or Mistral Small offer excellent cost-per-quality ratios. For complex reasoning, coding, or creative tasks, mid-tier models like Claude Sonnet or GPT-4o provide the best balance. Reserve premium models (Claude Opus, GPT-4.5) for tasks where quality is non-negotiable and volume is low.
Need to format your API responses? Try our JSON Formatter or generate OG Meta Tags for your AI-powered app.