Skip to main content
The Costs page gives you a complete picture of where your LLM budget is going. You can see which models are the most expensive, how your daily spend is trending, how much you’re saving through token caching, and where your current month is heading. Use this data to make informed decisions about model selection and sampling strategies before your next billing cycle arrives.
Zespan cost explorer showing daily spend breakdown and model attribution
Full cost attribution and forecasting require the Pro plan or higher. Basic cost totals are visible on all plans.

Cost by model

The bar chart at the top of the page ranks every model you’ve used in the selected period by total spend. Each bar shows the model name and its USD cost for the period. Hover over a bar to see the underlying numbers: number of calls, total tokens, average cost per call, and cache hit rate. This chart answers the most common cost question immediately: “which model is responsible for most of my bill?” If one model towers above the others, that’s your highest-leverage target for optimization.

Cost over time

The line chart below shows your daily spend across the selected date range. Each point represents total cost for that calendar day across all models. Use this chart to spot:
  • Sudden spikes that coincide with a deployment or feature launch
  • Gradual cost growth that may indicate increasing usage or a model change
  • Days with unexpectedly low cost that may point to an outage or misconfiguration
You can change the date range using the selector above the chart. Shorter ranges (7 days) show finer detail; longer ranges (90 days) reveal trends.

Cache hit ratio

The cache hit ratio card shows what percentage of your input tokens were served from the model provider’s prompt cache rather than recomputed from scratch. A higher cache hit ratio means lower cost and lower latency for those requests. The card displays:
  • Cache hit ratio — percentage of total input tokens that were cached
  • Cached tokens — the raw count of tokens served from cache
  • Estimated savings — the USD amount saved by not recomputing those tokens
To increase your cache hit ratio, structure your prompts so that the static system prompt comes first and only the dynamic user content changes per request. The Zespan SDK tracks cached_tokens automatically when your provider reports them.

Month-to-date spend

The gauge in the upper-right corner shows your cumulative spend for the current calendar month. The gauge fills from zero toward the outer ring, which represents your budgeted monthly limit (if you’ve set one). The exact dollar amount is shown in the center.

30-day forecast

Zespan 30-day cost forecast with confidence bands
Below the gauge, a forecast card projects your total spend for the next 30 days based on a linear extrapolation of your recent daily averages. This is a straight-line estimate — it does not account for planned changes in traffic — but it gives you an early warning if your current trajectory will exceed your budget.
The forecast uses the last 14 days of data to calculate the daily average. If your usage pattern is highly variable or you recently made a significant change (such as switching models), treat the forecast as a directional signal rather than a precise prediction.

Cost by user

The cost by user table requires the Team plan or higher.
If your SDK passes a userId when creating spans, the cost by user table breaks down spend per user for the selected period. The table shows each user ID, their total cost, number of requests, and average cost per request, sorted by total spend descending. This view is useful for understanding which users or user segments are the most expensive to serve, and for detecting unusual individual usage that may indicate a bug or abuse.
The cost by user table only populates for requests where your SDK explicitly sets a user ID. If you haven’t configured this, see the SDK documentation for how to attach user context to your traces.

Acting on cost data

The Costs page is most useful when it informs action. Here are two common levers:
If your costs are higher than expected and your application can tolerate missing some traces, reduce the sampleRate in your SDK configuration. A sampleRate of 0.5 sends half of all events to Zespan, cutting your event quota usage and any associated overage charges in half. Set it in your SDK initialization:
import { zespan } from "@zespan/sdk";

zespan.init({
  apiKey: process.env.ZESPAN_API_KEY,
  sampleRate: 0.5, // trace 50% of requests
});
If the cost by model chart shows that one expensive model handles a large share of your requests, consider whether a smaller model could handle some of those workloads. Use the Cost Optimizer at the bottom of this page to analyze a specific trace and get a model-switching recommendation with a confidence score.

Cost Optimizer

The Cost Optimizer analyzes a specific trace and tells you whether a cheaper model could handle the same task — and by how much your cost would drop. Plan availability: Pro and above

How to use it

Enter a trace ID in the Cost Optimizer panel (scroll to the bottom of this page) and click Analyze. The engine evaluates the trace’s task complexity against model capability and returns a recommendation. Task complexity is scored on a 1–10 scale:
  • 1–3 — simple classification, extraction, short Q&A
  • 4–6 — summarization, structured output generation
  • 7–10 — code generation, multi-step reasoning, tool use
A downgrade is only suggested when complexity is 5 or below, and never when the trace contains tool calls or extended reasoning tokens.

What you get back

FieldDescription
Current modelThe model used in the analyzed trace
Suggested modelThe recommended cheaper alternative
Potential savingsEstimated cost reduction (0–95%)
Confidencehigh, medium, or low
Complexity score1–10 score for the task
ReasoningOne-sentence explanation of why the switch is appropriate
Start with your most expensive traces from the Top sessions by cost table above — paste those trace IDs into the Cost Optimizer to find the highest-value switching opportunities.