DIY LLM Cost Tracking: What Engineers Actually Build (and Why Most Stop)

DIY LLM Cost Tracking: What Engineers Actually Build (and Why Most Stop)

Two laptop screens side by side on a desk — llm cost tracking tools comparison

Your LLM spend is climbing, and you have no idea which agent, team, or feature is responsible. So you do what engineers do: you build something. A custom middleware layer, a few database tables, a Grafana dashboard. It works for a while. Then it doesn’t.

This article breaks down the real LLM cost tracking tools comparison that engineering teams face: building custom logging infrastructure versus adopting a dedicated cost attribution platform. We’ll walk through exactly what a DIY solution requires, where it breaks, and when a purpose-built tool like Tokenr pays for itself.

TL;DR

  • A production-grade DIY LLM cost tracking system takes 160–320+ engineering hours to build and requires ongoing maintenance as providers change pricing and APIs.
  • Custom solutions typically cover basic token counting but miss multi-agent attribution, budget alerts, and cross-provider normalization.
  • Dedicated platforms like Tokenr auto-instrument OpenAI, Anthropic, and Google SDKs with one line of code, then attribute costs by agent, team, feature, and customer.
  • The build-vs-buy breakeven usually favors buying once LLM spend exceeds $5K/month or you’re running more than 3 agents across 2+ providers.

Table of contents

What engineers actually build for LLM cost tracking

We’ve talked to dozens of engineering teams about their homegrown LLM cost tracking setups. The architecture almost always looks the same:

  1. A wrapper or middleware layer around OpenAI/Anthropic client calls that intercepts requests and responses, counts tokens, and calculates cost based on a hardcoded pricing table.
  2. A logging pipeline that writes token counts, model name, and timestamp to a database (usually Postgres, sometimes a time-series DB like InfluxDB).
  3. A dashboard in Grafana, Metabase, or a custom React app that shows daily spend, model breakdown, and maybe a monthly trend line.
  4. A pricing lookup table maintained manually, updated whenever OpenAI or Anthropic changes rates (which happens frequently).

This stack takes a senior engineer 2–4 weeks to build as a first pass. It handles the basics: you can see total spend by model and day.

But “the basics” is where most teams get stuck.

Start tracking LLM costs by agent, team, and feature with Tokenr’s free early access.

Where custom LLM logging breaks down

The initial build isn’t the hard part. Maintenance is. Here’s where DIY LLM cost tracking consistently fails in production:

Provider pricing changes. OpenAI has changed model pricing multiple times since GPT-4’s launch. Anthropic’s pricing differs between Claude 3.5 Sonnet and Claude 3 Opus. Google’s Gemini models have their own rate structure. Keeping a hardcoded pricing table accurate across providers requires someone to monitor announcements and update values, sometimes within hours of a price change. Miss one, and your cost data is wrong until someone notices.

Multi-agent attribution. A single API call is easy to log. But when you have 8 agents making calls across 3 providers, and you need to know which agent, which team, and which product feature drove each dollar of spend, your simple middleware needs metadata propagation through your entire call stack. This is where custom solutions get complex fast.

Budget controls. Logging spend is one thing. Alerting at 50%, 80%, and 100% of a budget threshold, per team, per agent, before the bill arrives, requires a real-time aggregation layer on top of your logging pipeline. Most DIY systems track spend retroactively, which means you find out about the $12,000 overnight spike on Monday morning.

Cross-provider normalization. OpenAI reports tokens differently than Anthropic, which reports differently than Google. Input/output token splits, caching credits, batch API discounts: normalizing all of this into a single cost view is a meaningful engineering project on its own.

Expert Insight: According to Deloitte’s 2026 technology predictions, 66.5% of enterprises experience AI cost overruns, and 70% of AI spending happens outside IT oversight. Custom logging systems rarely solve the governance gap because they weren’t designed for organizational cost attribution.

LLM cost tracking middleware vs. dedicated platform

The LLM cost tracking middleware vs. dedicated platform decision comes down to scope. Here’s a concrete comparison:

Capability DIY middleware Dedicated platform (Tokenr)
Basic token counting Yes (manual pricing tables) Yes (auto-updated pricing)
Multi-provider support Build per provider OpenAI, Anthropic, Google auto-instrumented
Agent-level attribution Custom metadata propagation Built-in agent_id, team_id, feature tags
Budget alerts (50/80/100%) Build from scratch Configured in dashboard, email alerts
Per-customer cost tracking Significant custom work Tag with customer metadata
CSV export Build it Built-in
Model optimization suggestions Not practical to build Included (model downgrade recommendations)
Setup time 160–320+ hours Under 30 minutes
Ongoing maintenance 4–8 hours/month minimum Handled by the platform
Privacy (prompt/response logging) Depends on implementation Never logs prompts or responses

The gap widens as your system grows. A two-agent, single-provider setup might work fine with custom logging. But the moment you add Anthropic alongside OpenAI, or scale from 3 agents to 15, the DIY approach demands another engineering sprint.

Just as tradespeople eventually realize that the best invoice app for tradesmen beats a spreadsheet once job volume grows, engineering teams hit a similar inflection point with LLM cost tracking: the custom solution that worked at low volume becomes a liability at scale.

See how Tokenr’s SDK auto-instruments your existing LLM calls — view the API docs.

LLM cost attribution by team and feature: DIY vs. Tokenr

LLM cost attribution by team and feature is where the build-vs-buy gap is widest. In a DIY system, you need to:

  1. Define a metadata schema for agent, team, feature, and customer tags
  2. Propagate that metadata through every LLM call in your application
  3. Store it alongside token and cost data
  4. Build aggregation queries that group by any combination of dimensions
  5. Create filtered dashboard views for each team lead

With Tokenr, you pass metadata as parameters on your existing API calls:

python response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Classify this ticket"}], tokenr_agent_id="support-bot", tokenr_team_id="customer-success", tokenr_feature="ticket-triage" )

That’s it. Tokenr’s Python SDK auto-patches the OpenAI, Anthropic, and Google client libraries at import time. No wrappers, no base URL changes, no refactors. The platform then tracks token counts (input/output split), cost, latency, model, and your attribution tags. It never logs prompt or response content, which matters if you’re handling customer data subject to GDPR or SOC 2 requirements.

Your dashboard immediately shows cost broken down by agent, team, feature, and model, with daily trends, MTD totals, and projected month-end spend.

When does building your own stop making sense?

We see a clear pattern in when teams switch from custom LLM logging to Tokenr:

  • LLM spend crosses $5K/month. Below that, the cost of a dedicated tool might not justify itself. Above it, a 10–15% optimization from model downgrade suggestions alone covers the platform cost.
  • You’re running 3+ agents. Single-agent cost tracking is simple. Multi-agent attribution is not.
  • You use 2+ providers. Cross-provider normalization is tedious, error-prone, and never “done.”
  • You need team-level accountability. When engineering leadership or finance asks “which team spent $18K on Claude last month,” you need an answer in seconds, not a week of log analysis.
  • An engineer is spending 4+ hours/month maintaining the tracking system. At senior engineer rates ($100–200/hour), that’s $400–800/month in maintenance alone, before counting the opportunity cost of what they could be building instead.

Tokenr’s pricing starts at $199/month for the Starter plan, with Growth at $399/month. It’s free while in early access, no credit card required.

Start tracking LLM spend across all your agents and providers. Create your Tokenr account.

Frequently asked questions

How do I track LLM costs across multiple providers?

You need a system that normalizes token counts and pricing across OpenAI, Anthropic, Google, and other providers into a single cost view. Tokenr’s SDK auto-instruments the official client library for each supported provider at import time, so every API call is tracked with consistent cost calculations regardless of provider. The alternative is maintaining separate pricing tables and normalization logic per provider, which requires updates every time a provider changes rates.

How do I attribute LLM spend by feature or team in production?

Pass metadata tags (agent_id, team_id, feature_name) with each LLM API call. In Tokenr, these are additional parameters on your existing client calls. The platform aggregates costs by any combination of these dimensions. Building this yourself requires a custom metadata propagation layer, storage schema, and aggregation queries.

How long does it take to set up LLM cost tracking with Tokenr?

Setup takes under 30 minutes. Install the SDK (pip install tokenr for Python, gem install tokenr for Ruby), call tokenr.init() with your API token, and your existing OpenAI, Anthropic, and Google API calls are automatically tracked. No code changes to your LLM calls are required beyond optional attribution tags.

Does Tokenr store my prompts or responses?

No. Tokenr tracks metadata only: token counts (input and output), cost, latency, model name, provider, agent ID, team ID, feature name, and custom tags. Prompt and response content is never logged or stored.

How do I set budget alerts for LLM API usage?

Tokenr includes configurable budget alerts at 50%, 80%, and 100% thresholds with email notifications. You set a budget per organization, and the dashboard shows a progress bar against that budget with projected month-end spend. Building equivalent alerting in a DIY system requires a real-time aggregation layer and notification infrastructure.

What’s the real cost of building LLM cost tracking in-house?

Based on what we’ve seen across teams, expect 160–320+ engineering hours for an initial build covering multi-provider support, agent attribution, budget alerts, and a dashboard. Ongoing maintenance runs 4–8 hours per month to handle pricing changes, new model support, and bug fixes. At senior engineer rates, that’s $16,000–64,000 in initial build cost and $4,800–19,200 per year in maintenance.

Ready to get started?

If you’re still tracking LLM costs in spreadsheets, vendor dashboards, or a custom middleware stack you built in a sprint six months ago, Tokenr replaces all of it with one SDK integration.

You get cost attribution by agent, model, team, and feature across OpenAI, Anthropic, and Google. Budget alerts before bills run away. Model optimization recommendations. No prompt logging, no major refactor.

Tokenr is free while in early access, no credit card required.

Start tracking your LLM spend now.

Track your LLM costs

One line of code. Per-agent attribution. Budget alerts before you overspend.

Start Free — No Credit Card →

More from the blog