Skip to content
Tool

Prompt Cost Analyzer

Estimate the monthly cost of a prompt across OpenAI's GPT-5 models — and see how prompt caching changes the math at different traffic patterns.


Input tokens ~83

Cost shape

Cache only discounts the system slice

System is 5.5% of cost — caching will trim the bill modestly.

Monthly cost

Pick the column that matches your traffic pattern. No cache = prompt too short to engage, or prefix changes per call. Bursty ≈ 50% hit (intermittent traffic with gaps wider than the cache TTL). Steady ≈ 80% hit (regular production traffic, occasional gaps). Always-warm ≈ 95% hit (sustained traffic, byte-stable prefix).

ModelTokens In / Out$ / CallNo cacheBurstySteadyAlways-warm
GPT-5.5
in $5 · out $30 · cache $0.50
83 200$0.006415
$642
$626
−2.5%
$616
−3.9%
$612
−4.7%
GPT-5.4
in $2.50 · out $15 · cache $0.25
83 200$0.003208
$321
$313
−2.5%
$308
−3.9%
$306
−4.7%
GPT-5.4 mini
in $0.75 · out $4.50 · cache $0.075
83 200$0.000962
$96.22
$93.86
−2.5%
$92.44
−3.9%
$91.74
−4.7%
GPT-5.4 nano
in $0.20 · out $1.25 · cache $0.02
83 200$0.000267
$26.66
$26.03
−2.4%
$25.65
−3.8%
$25.46
−4.5%

How caching works here

This is a single-turn model — system prompt is treated as the cacheable prefix; user prompt and output tokens bill at standard rates. In a multi-turn conversation or agent loop, the cacheable prefix actually grows to include prior turns, so cache savings scale further than what's shown here. On a cache hit, prefix tokens bill at OpenAI's discounted cache-read rate — about 10% of standard input pricing for the GPT-5 family. Real applications with long, stable prefixes and high call volume often sustain 80%+ hit rates, turning the cache columns into the actual P&L line.

Token counts are approximate (±10%). Estimated from character length; actual counts will differ slightly by content.

Source · verified 2026-05-12

OpenAI API pricing (opens in new tab)
Production workloads

Need help optimizing your AI bill?

I run Devclock. Get in touch if you'd like a second pair of eyes on your prompt structure, model selection, or cost trajectory.

Visit Devclock