Prompt Cost Analyzer
Estimate the monthly cost of a prompt across OpenAI's GPT-5 models — and see how prompt caching changes the math at different traffic patterns.
Cost shape
Cache only discounts the system sliceSystem is 5.5% of cost — caching will trim the bill modestly.
Monthly cost
Pick the column that matches your traffic pattern. No cache = prompt too short to engage, or prefix changes per call. Bursty ≈ 50% hit (intermittent traffic with gaps wider than the cache TTL). Steady ≈ 80% hit (regular production traffic, occasional gaps). Always-warm ≈ 95% hit (sustained traffic, byte-stable prefix).
| Model | Tokens In / Out | $ / Call | No cache | Bursty | Steady | Always-warm |
|---|---|---|---|---|---|---|
GPT-5.5 in $5 · out $30 · cache $0.50 | 83 → 200 | $0.006415 | $642 | $626 −2.5% | $616 −3.9% | $612 −4.7% |
GPT-5.4 in $2.50 · out $15 · cache $0.25 | 83 → 200 | $0.003208 | $321 | $313 −2.5% | $308 −3.9% | $306 −4.7% |
GPT-5.4 mini in $0.75 · out $4.50 · cache $0.075 | 83 → 200 | $0.000962 | $96.22 | $93.86 −2.5% | $92.44 −3.9% | $91.74 −4.7% |
GPT-5.4 nano in $0.20 · out $1.25 · cache $0.02 | 83 → 200 | $0.000267 | $26.66 | $26.03 −2.4% | $25.65 −3.8% | $25.46 −4.5% |
How caching works here
This is a single-turn model — system prompt is treated as the cacheable prefix; user prompt and output tokens bill at standard rates. In a multi-turn conversation or agent loop, the cacheable prefix actually grows to include prior turns, so cache savings scale further than what's shown here. On a cache hit, prefix tokens bill at OpenAI's discounted cache-read rate — about 10% of standard input pricing for the GPT-5 family. Real applications with long, stable prefixes and high call volume often sustain 80%+ hit rates, turning the cache columns into the actual P&L line.
Token counts are approximate (±10%). Estimated from character length; actual counts will differ slightly by content.
Source · verified 2026-05-12
OpenAI API pricing (opens in new tab)Need help optimizing your AI bill?
I run Devclock. Get in touch if you'd like a second pair of eyes on your prompt structure, model selection, or cost trajectory.
Visit Devclock