Skip to content
Tool

Prompt Cost Analyzer

Estimate the monthly cost of a prompt across OpenAI's GPT-5 models — and see how prompt caching changes the math at different traffic patterns.


Input tokens ~60

Cost shape

Cache only discounts the system slice

System is only 3.7% of cost — caching can't move the bill much in this shape. Either increase the system prompt or decrease output tokens to see meaningful cache savings.

Monthly cost

Pick the column that matches your traffic pattern. No cache = prompt too short to engage, or prefix changes per call. Bursty ≈ 50% hit (intermittent traffic with gaps wider than the cache TTL). Steady ≈ 80% hit (regular production traffic, occasional gaps). Always-warm ≈ 95% hit (sustained traffic, byte-stable prefix).

ModelTokens In / Out$ / CallNo cacheBurstySteadyAlways-warm
GPT-5.5
in $5 · out $30 · cache $0.50
60 200$0.0063
$630
$619
−1.7%
$613
−2.7%
$610
−3.2%
GPT-5.4
in $2.50 · out $15 · cache $0.25
60 200$0.00315
$315
$310
−1.7%
$307
−2.7%
$305
−3.2%
GPT-5.4 mini
in $0.75 · out $4.50 · cache $0.075
60 200$0.000945
$94.50
$92.91
−1.7%
$91.96
−2.7%
$91.49
−3.2%
GPT-5.4 nano
in $0.20 · out $1.25 · cache $0.02
60 200$0.000262
$26.20
$25.78
−1.6%
$25.52
−2.6%
$25.40
−3.1%

How caching works here

This is a single-turn model — system prompt is treated as the cacheable prefix; user prompt and output tokens bill at standard rates. In a multi-turn conversation or agent loop, the cacheable prefix actually grows to include prior turns, so cache savings scale further than what's shown here. On a cache hit, prefix tokens bill at OpenAI's discounted cache-read rate — about 10% of standard input pricing for the GPT-5 family. Real applications with long, stable prefixes and high call volume often sustain 80%+ hit rates, turning the cache columns into the actual P&L line.

Token counts are approximate (±10%). Estimated from character length; actual counts will differ slightly by content.

Source · verified 2026-05-12

OpenAI API pricing (opens in new tab)
Production workloads

Need help optimizing your AI bill?

I run Devclock. Get in touch if you'd like a second pair of eyes on your prompt structure, model selection, or cost trajectory.

Visit Devclock

Stay in the loop

The occasional email when there's something new worth sharing.