Stop Running Blind on AI Usage
Non-deterministic models, hidden cost and silent failures make AI hard to manage and scale. Here’s why structured logging is your first line of defence. If your platform relies on LLM like GPT-4 or Claude, chances are you’re spending real money on every request. You’re also relying on a system that can behave differently from one minute to the next. AI responses are non-deterministic, costly, and — without proper logging — completely opaque.
This article is about visibility. Nothing too fancy, just basic metrics you should keep within an easy reach:
- What requests are we sending?
- What responses are we getting?
- How much is it costing us?
- When things go wrong, how fast do we know why?
Structured logging is the foundation of building observable system that lets you quickly act on anomalies. With it, your engineers can debug code, your product team can prioritise feature requests and bug fixes, and your business in general can avoid nasty surprises related to excess cost and billing.
Observability isn’t a nice-to-have. It’s a survival tool for teams shipping AI at scale.
Why Traditional Logging Falls Short?
Most teams already log errors. Some even log request payloads. But when it comes to AI, that’s not enough. Why? Because AI failures rarely show up as 500 error responses. Instead, you get:
- Nonsensical responses that pass validation
- Subtle regressions when a prompt gets updated
- Sudden cost spikes from accidental overuse
- Silent failures masked by retries and fallback logic
When those things happen, you want more than a generic “Something went wrong” message. You want searchable, structured logs that tell you:
- What the prompt looked like
- Which tenant made the request
- What model was used
- How many tokens were consumed
- How long the response took, and whether it succeeded or not
Flat logs or generic error tracking tools won’t give you this. They’re too shallow. They don’t understand the shape of AI traffic. And that’s exactly why structured logging — with the right fields and tools — is such a game changer.
Structured Logging Is Your Friend
Consider each request and response related to an AI model a first-class citizen in your observability stack. Instead of dumping raw prompts into your logs, you define a consistent schema — a shape your logs follow. This way, the logs can be searched, filtered, aggregated, and visualised.

Here’s what a structured log entry might look like:
{
"@timestamp": "2025-07-26T12:43:21.333Z",
"correlationId": "abc123xyz0",
"tenantId": "tenant_a",
"feature": "smart_summary",
"model": "gpt-4",
"promptType": "retrieval_augmented",
"promptTokens": 1420,
"completionTokens": 215,
"status": "success",
"latencyMs": 1280
}
This tiny piece of structure unlocks massive value making your logs:
- Searchable: Filter by feature, tenant, model, or status
- Aggregable: Sum token usage across tenants or time
- Visualisable: Chart usage spikes or latency outliers
- Traceable: Follow a single user request across multiple layers using
correlationId
You’re not just logging anymore. You’re measuring, debugging, and tracing cost drivers.
Thanks to open source tools like Kibana or Grafana, you can plug these logs into real dashboards at a reasonable price tag, or even without spending a dime at all.
Dashboards and Alerts That Actually Matter
Once you’ve got structured logs flowing into Elasticsearch or any other log store, tools like Kibana or Grafana become powerful allies. You’re no longer just storing logs, you’re asking questions and getting answers in real time.

Here are three metrics you should consider plotting on your dashboard.
Feature Usage Over Time
See how specific AI-powered features like text_analysis or smart_summary are used throughout the day. The dashboard above shows simple request volumes for a specific feature over time, but you could take it even further. Break it down by tenantId , model , or status to spot:
- Usage spikes from specific tenants
- Rollout patterns after a new release
- Unexpected load on older models (e.g. fallback to
gpt-3.5)
Cost Heatmap
Use promptTokens and completionTokens as a proxy for cost (you can multiply by model-specific pricing, if it’s straightforward enough). It lets you easily visualise which tenants or features drive your AI cost. This instantly tells you:
- Which tenants generate most of the load
- Which features are driving your AI bill
- Where you can optimise or throttle usage
Live Alerts
Structured logs unlock real-time alerting. Set up automatic triggers like:
- High latency alert: if
latencyMs > 1500over the last minute - Failure burst alert: if multiple
status == failurelogs happen in a short window - Usage spike alerts: if a single tenant’s token usage exceeds expected norms

Beyond alerts, you can add actionable insights right into your dashboard. For example, you might find it useful to know top five requests that took the longest to process at any given time. Adding useful information — such as correlationId — makes the insight immediately actionable.

Closing Thoughts
AI is powerful, but it’s also unpredictable. When your business depends on it, you can’t afford to guess what’s happening under the hood.
Structured logging gives you visibility and dashboards provide clarity you need to take informed decisions. Alerts are there for your peace of mind. Together, they turn your AI integration from a black box into something you can measure, trust and optimise.
If you’re running AI as if it were just another software module — you’re almost certainly running blind.
In the next article, we’ll look at how to implement all of this in practice:
- Setting up structured logs in Spring Boot (and Kotlin)
- Capturing token usage, latency and status
- Pushing logs to Elasticsearch
- Visualising everything in Kibana with zero fluff
Until then, take a hard look at your logs. Or lack of them.