Tool

Prompt Cost Analyzer

Estimate the monthly cost of a prompt across OpenAI's GPT-5 models — and see how prompt caching changes the math at different traffic patterns.

System prompt ~70 tokens User prompt ~13 tokens

Input tokens ~83

Output tokens Calls / month

Cost shape

Cache only discounts the system slice

OUTPUT 93.5%

System is 5.5% of cost — caching will trim the bill modestly.

Monthly cost

Pick the column that matches your traffic pattern. No cache = prompt too short to engage, or prefix changes per call. Bursty ≈ 50% hit (intermittent traffic with gaps wider than the cache TTL). Steady ≈ 80% hit (regular production traffic, occasional gaps). Always-warm ≈ 95% hit (sustained traffic, byte-stable prefix).

Model	Tokens In / Out	$ / Call	No cache	Bursty	Steady	Always-warm
GPT-5.5 in $5 · out $30 · cache $0.50	83 → 200	$0.006415	$642	$626 −2.5%	$616 −3.9%	$612 −4.7%
GPT-5.4 in $2.50 · out $15 · cache $0.25	83 → 200	$0.003208	$321	$313 −2.5%	$308 −3.9%	$306 −4.7%
GPT-5.4 mini in $0.75 · out $4.50 · cache $0.075	83 → 200	$0.000962	$96.22	$93.86 −2.5%	$92.44 −3.9%	$91.74 −4.7%
GPT-5.4 nano in $0.20 · out $1.25 · cache $0.02	83 → 200	$0.000267	$26.66	$26.03 −2.4%	$25.65 −3.8%	$25.46 −4.5%

How caching works here

This is a single-turn model — system prompt is treated as the cacheable prefix; user prompt and output tokens bill at standard rates. In a multi-turn conversation or agent loop, the cacheable prefix actually grows to include prior turns, so cache savings scale further than what's shown here. On a cache hit, prefix tokens bill at OpenAI's discounted cache-read rate — about 10% of standard input pricing for the GPT-5 family. Real applications with long, stable prefixes and high call volume often sustain 80%+ hit rates, turning the cache columns into the actual P&L line.

Token counts are approximate (±10%). Estimated from character length; actual counts will differ slightly by content.

Source · verified 2026-05-12

OpenAI API pricing

Production workloads

Need help optimizing your AI bill?

I run Devclock. Get in touch if you'd like a second pair of eyes on your prompt structure, model selection, or cost trajectory.

Visit Devclock