Pulsars

LLM API Pricing & Token Calculator

Your prompts stay in your browser. Token counting is done locally.
0.5x1x2x5x
Show cached input pricing
Enter a prompt or token count to see pricing
Prices last verified: March 2026. Check provider websites for current pricing.

Large language model (LLM) API pricing is based on tokens — sub-word units roughly equivalent to 4 characters or 0.75 words in English. As of 2026, prices range from $0.07/million tokens (Gemini 2.0 Flash input) to $75/million tokens (GPT-4.5 output), a 1,000x spread. Input tokens (your prompt) are typically 2-10x cheaper than output tokens (the model's response) because autoregressive generation requires significantly more computation per token than encoding.

How Token Pricing Works

LLM APIs charge per token, with separate rates for input (your prompt) and output (the model's response). A token is roughly 4 characters in English. Prices are expressed per million tokens. The total cost of a request depends on three factors: input token count, output token count, and the model's per-token rates.

How do you choose the right LLM model for your budget?

For high-volume, low-complexity tasks (classification, extraction, simple Q&A), budget models like GPT-4o mini, Gemini Flash, or Mistral Small offer excellent cost-per-quality ratios. For complex reasoning, coding, or creative tasks, mid-tier models like Claude Sonnet or GPT-4o provide the best balance. Reserve premium models (Claude Opus, GPT-4.5) for tasks where quality is non-negotiable and volume is low.

Need to format your API responses? Try our JSON Formatter or generate OG Meta Tags for your AI-powered app.

Frequently Asked Questions

How are LLM API tokens counted?

+

Tokens are sub-word units — roughly 4 characters or 0.75 words in English. Our estimator gives a ±10% approximation. Exact counts depend on each model's tokenizer.

Why do output tokens cost more than input tokens?

+

Generating text requires more computation than reading it. Output tokens go through the full autoregressive decoding process, which is inherently more expensive per token.

What is cached input pricing?

+

Some providers (OpenAI, Anthropic, DeepSeek) offer reduced prices when re-sending the same context prefix. This is useful for applications with large system prompts that rarely change.

Which LLM API is cheapest?

+

For most use cases in 2026, Gemini 2.0 Flash and Mistral Small offer the lowest per-token costs. DeepSeek V3 is also very competitive. The cheapest option depends on your quality requirements.

How often do LLM API prices change?

+

Prices drop frequently — typically every 3-6 months. OpenAI and Anthropic have both cut prices multiple times. We verify prices monthly but recommend checking provider websites.

Related Tools