If you're building with Claude, the pricing page alone won't tell you what you actually need to know. This post breaks down every model, every discount mechanism, and what real workloads actually cost — with no hand-waving.


The Short Answer

Claude API pricing is token-based. You pay per million tokens (MTok) for input and output separately. Costs range from $0.25/MTok input on the cheapest legacy model to $5/MTok input on Claude Opus 4.6. Output tokens cost more than input tokens across all models.

The model you pick matters enormously. Haiku 4.5 costs 20x less than Opus 4.6 for the same token volume.


Current Claude API Pricing by Model

All prices are in USD per million tokens (MTok), as of mid-2025.

Current Generation Models

Model Input ($/MTok) Output ($/MTok) Context Window
claude-opus-4-6 $5.00 $25.00 1M tokens
claude-opus-4-5 $5.00 $25.00 200K tokens
claude-sonnet-4-6 $3.00 $15.00 1M tokens
claude-sonnet-4-5 $3.00 $15.00 200K tokens
claude-haiku-4-5 $1.00 $5.00 200K tokens

Legacy Models (Still Available)

Model Input ($/MTok) Output ($/MTok)
claude-haiku-3-5 $0.80 $4.00
claude-haiku-3 $0.25 $1.25
claude-opus-4 $15.00 $75.00
Notable pricing changes

Opus 4 vs. Opus 4.6 is not a typo. The newer "4.6" generation models are dramatically cheaper than the original Opus 4. Anthropic significantly restructured pricing with the 4.5/4.6 generation — Opus 4.6 at $5/MTok input vs. Opus 4 at $15/MTok input for the same capability tier. Sonnet 4.6 and Opus 4.6 also include a full 1M token context window at standard price.

The long-context surcharge ($6/MTok input, $22.50/MTok output) only kicks in above 200K input tokens per request. Most applications don't hit this.


The Two Biggest Levers: Batch API and Prompt Caching

Before you worry about model selection, understand these two discounting mechanisms. They can cut your bill by 50–90%.

Batch API (50% off)

The Batch API processes requests asynchronously. You submit a batch, results come back within 24 hours. For any workload that isn't user-facing and real-time, this is the easiest way to halve your cost.

Model Batch Input ($/MTok) Batch Output ($/MTok)
claude-opus-4-6 $2.50 $12.50
claude-sonnet-4-6 $1.50 $7.50
claude-haiku-4-5 $0.50 $2.50
claude-haiku-3 (batch) $0.125 $0.625

Prompt Caching (up to 90% off on re-read)

If your requests share a large, repeated prefix — a system prompt, a reference document, a codebase — prompt caching lets the API reuse that processed context instead of re-tokenizing it every call.

Cache writes cost slightly more upfront (1.25x for a 5-minute cache, 2x for a 1-hour cache), but cache reads cost just 10% of normal input price. Break-even happens after one re-read on 5-minute caches, two re-reads on 1-hour caches. In most real applications with repeated system prompts, caching pays for itself within a single session.

Stacking discounts

These two discounts stack. Batch processing + prompt caching can reduce effective input costs by 95% compared to naive API usage. This is not a rounding error — it's the difference between a $500/month bill and a $25/month bill for the same workload.


Real-World Cost Examples

Abstract pricing is less useful than seeing what actual applications cost. Here are three representative scenarios.

Scenario 1: Customer-Facing Chatbot at 1M Tokens/Day

Setup: A support chatbot. Each conversation averages 2,000 input tokens (system prompt + conversation history) and 500 output tokens. You handle roughly 400 conversations per day. That's 800K input tokens and 200K output tokens per day — approximately 1M total.

Approach Daily cost Monthly cost
Sonnet 4.6, no caching ~$5.40 ~$162
Sonnet 4.6 + prompt caching ~$3.78 ~$113

With prompt caching (assuming 1,500 of 2,000 input tokens are a shared system prompt): cache reads run at $0.30/MTok vs. $3.00/MTok — roughly 30% reduction on a real-time conversational workload.

Scenario 2: AI Coding Assistant

Setup: A coding assistant embedded in an IDE. Each interaction sends ~4,000 tokens of context (open files, instructions, conversation) and generates ~800 tokens of code. 50 interactions per day, 20 developers on the team = 1,000 requests/day: 4M input tokens, 800K output tokens.

Model Daily cost Monthly cost
Claude Sonnet 4.6 $24.00 $720
Claude Haiku 4.5 $8.00 $240

Model selection alone is a 3x cost difference here. Profile what your task actually needs before defaulting to Sonnet — for autocomplete-style tasks, Haiku is usually sufficient.

Scenario 3: Batch Document Processing

Setup: Nightly job that classifies and summarizes 10,000 documents. Average document is 800 input tokens; output is 150 tokens of structured summary. Not time-sensitive — results needed by morning.

Approach Per nightly run Monthly
Sonnet 4.6, real-time ~$136 ~$4,080
Haiku 4.5, Batch API ~$7.75 ~$232

The Batch API + Haiku combination runs this job at roughly 5% of what naive real-time Sonnet would cost. That's not a rounding error — it's a real architectural choice worth making at design time.


How to Actually Reduce Your Claude API Spend

  • Start with Haiku, escalate if needed. Most routing and classification tasks, short-context completions, and structured extraction don't require Sonnet-level capability. Build with Haiku first. Upgrade specific flows that actually need it after you've benchmarked quality.
  • Cache your system prompt. If you have a system prompt longer than ~500 tokens that's consistent across requests, add a cache_control block. It takes 10 minutes to implement and reduces those tokens to 10% cost on every subsequent call.
  • Route to Batch for anything async. Data pipelines, report generation, overnight jobs, test suite evaluation — none of these need real-time responses. The Batch API's 50% discount is automatic. There's no good reason to pay full price for non-interactive workloads.
  • Trim your context aggressively. Every token in your input costs money. Audit your prompts: remove redundant instructions, truncate conversation history appropriately, and avoid including full documents when relevant snippets will do.
  • Monitor per-request token counts during development. The API response includes usage.input_tokens and usage.output_tokens. Log these in dev. It's common to discover that a "simple" feature is sending 5K tokens per call because of an accidentally included debug payload or unexpectedly long tool schema.

Tracking Costs in Production

The Anthropic Console gives you top-level usage dashboards, but it doesn't give you per-feature, per-user, or per-model breakdown out of the box. For teams that need cost attribution across multiple features or are watching spend against a budget, that gap gets painful quickly.

At the application layer, you should be logging token counts per request alongside request metadata (user ID, feature name, model used). From there you can build spend tracking in your data warehouse or use a purpose-built tool like ClawCost, which handles the aggregation and alerting without requiring you to instrument everything from scratch.

The point is: don't wait until your monthly bill arrives to understand your cost breakdown. Token usage patterns during development often look nothing like production traffic.


Special Cases to Know About

  • Tool use adds tokens. When you include tools in a request, Claude automatically adds a system prompt to enable them — roughly 313–346 extra input tokens depending on the tool choice mode. Tool definitions themselves also count as input tokens. In heavily agentic applications with large tool schemas, this overhead can be substantial.
  • Web search costs extra. The built-in web search tool costs $10 per 1,000 searches on top of token costs. If you're building agents that search frequently, budget for this separately.
  • Third-party platforms have their own pricing. Claude via AWS Bedrock or Google Vertex AI has different pricing, including regional vs. global endpoint premiums (regional is 10% more). Check those platforms' pricing pages directly rather than assuming parity with the direct API.
  • Enterprise pricing is negotiable. At high volume, Anthropic's published rates aren't necessarily the floor. Custom pricing is available through their sales team.

Quick Reference: Which Model Should I Use?

Use Case Recommended Model Rough Cost Range
High-stakes reasoning, complex agents claude-opus-4-6 $5–$25/MTok
General-purpose dev/prod workloads claude-sonnet-4-6 $3–$15/MTok
High-volume classification, extraction claude-haiku-4-5 $1–$5/MTok
Ultra-high-volume batch, cost-sensitive claude-haiku-3 (batch) $0.125–$0.625/MTok

The Bottom Line

Claude API pricing in 2025 is genuinely competitive at every tier, but the variance between "naive usage" and "optimized usage" is enormous. A team spending $500/month could often run the same workload for $50–$100 with the right model selection, prompt caching, and Batch API routing.

The decisions that matter most:

  1. Model selection — Haiku vs. Sonnet vs. Opus is often a 3–20x cost difference.
  2. Batch vs. real-time — a 50% cut on non-interactive work.
  3. Prompt caching — 90% reduction on repeated context.

Get those three right and you've done the heavy lifting. After that, logging token counts per feature and watching spend over time keeps surprises off your bill.

Pricing sourced from the Anthropic API documentation. Prices are subject to change — verify current rates before making financial projections.

Stop guessing where your API spend is going

ClawCost tracks every token — broken down by model, session, and project — in real time. Self-hosted, open source, free to start.

Get started free → See pricing