Don't run blind on AI usage
| |

Stop Running Blind on AI Usage

Non-deterministic models, hidden cost and silent failures make AI hard to manage and scale. Here’s why structured logging is your first line of defence. If your platform relies on LLM like GPT-4 or Claude, chances are you’re spending real money on every request. You’re also relying on a system that can behave differently from one minute to the next. AI responses are non-deterministic, costly, and — without proper logging — completely opaque.

This article is about visibility. Nothing too fancy, just basic metrics you should keep within an easy reach:

  • What requests are we sending?
  • What responses are we getting?
  • How much is it costing us?
  • When things go wrong, how fast do we know why?

Structured logging is the foundation of building observable system that lets you quickly act on anomalies. With it, your engineers can debug code, your product team can prioritise feature requests and bug fixes, and your business in general can avoid nasty surprises related to excess cost and billing.

Observability isn’t a nice-to-have. It’s a survival tool for teams shipping AI at scale.

Why Traditional Logging Falls Short?

Most teams already log errors. Some even log request payloads. But when it comes to AI, that’s not enough. Why? Because AI failures rarely show up as 500 error responses. Instead, you get:

  • Nonsensical responses that pass validation
  • Subtle regressions when a prompt gets updated
  • Sudden cost spikes from accidental overuse
  • Silent failures masked by retries and fallback logic

When those things happen, you want more than a generic “Something went wrong” message. You want searchable, structured logs that tell you:

  • What the prompt looked like
  • Which tenant made the request
  • What model was used
  • How many tokens were consumed
  • How long the response took, and whether it succeeded or not

Flat logs or generic error tracking tools won’t give you this. They’re too shallow. They don’t understand the shape of AI traffic. And that’s exactly why structured logging — with the right fields and tools — is such a game changer.

Structured Logging Is Your Friend

Consider each request and response related to an AI model a first-class citizen in your observability stack. Instead of dumping raw prompts into your logs, you define a consistent schema — a shape your logs follow. This way, the logs can be searched, filtered, aggregated, and visualised.

Can you spot the schema? Structured logging makes your search intuitive and opens door to useful visual dashboards.

Here’s what a structured log entry might look like:

{
  "@timestamp": "2025-07-26T12:43:21.333Z",
  "correlationId": "abc123xyz0",
  "tenantId": "tenant_a",
  "feature": "smart_summary",
  "model": "gpt-4",
  "promptType": "retrieval_augmented",
  "promptTokens": 1420,
  "completionTokens": 215,
  "status": "success",
  "latencyMs": 1280
}

This tiny piece of structure unlocks massive value making your logs:

  • Searchable: Filter by feature, tenant, model, or status
  • Aggregable: Sum token usage across tenants or time
  • Visualisable: Chart usage spikes or latency outliers
  • Traceable: Follow a single user request across multiple layers using correlationId

You’re not just logging anymore. You’re measuring, debugging, and tracing cost drivers.

Thanks to open source tools like Kibana or Grafana, you can plug these logs into real dashboards at a reasonable price tag, or even without spending a dime at all.

Dashboards and Alerts That Actually Matter

Once you’ve got structured logs flowing into Elasticsearch or any other log store, tools like Kibana or Grafana become powerful allies. You’re no longer just storing logs, you’re asking questions and getting answers in real time.

Leverage your logs for insightful visualisations.

Here are three metrics you should consider plotting on your dashboard.

Feature Usage Over Time

See how specific AI-powered features like text_analysis or smart_summary are used throughout the day. The dashboard above shows simple request volumes for a specific feature over time, but you could take it even further. Break it down by tenantId , model , or status to spot:

  • Usage spikes from specific tenants
  • Rollout patterns after a new release
  • Unexpected load on older models (e.g. fallback to gpt-3.5)

Cost Heatmap

Use promptTokens and completionTokens as a proxy for cost (you can multiply by model-specific pricing, if it’s straightforward enough). It lets you easily visualise which tenants or features drive your AI cost. This instantly tells you:

  • Which tenants generate most of the load
  • Which features are driving your AI bill
  • Where you can optimise or throttle usage

Live Alerts

Structured logs unlock real-time alerting. Set up automatic triggers like:

  • High latency alert: if latencyMs > 1500 over the last minute
  • Failure burst alert: if multiple status == failure logs happen in a short window
  • Usage spike alerts: if a single tenant’s token usage exceeds expected norms
With tools like Kibana alerting, you can route these to Slack, email, or PagerDuty — catching issues before your customers do.

Beyond alerts, you can add actionable insights right into your dashboard. For example, you might find it useful to know top five requests that took the longest to process at any given time. Adding useful information — such as correlationId — makes the insight immediately actionable.

Keep a running check of the slowest responses along with information allowing you to act on it.

Closing Thoughts

AI is powerful, but it’s also unpredictable. When your business depends on it, you can’t afford to guess what’s happening under the hood.

Structured logging gives you visibility and dashboards provide clarity you need to take informed decisions. Alerts are there for your peace of mind. Together, they turn your AI integration from a black box into something you can measure, trust and optimise.

If you’re running AI as if it were just another software module — you’re almost certainly running blind.

In the next article, we’ll look at how to implement all of this in practice:

  • Setting up structured logs in Spring Boot (and Kotlin)
  • Capturing token usage, latency and status
  • Pushing logs to Elasticsearch
  • Visualising everything in Kibana with zero fluff

Until then, take a hard look at your logs. Or lack of them.

Similar Posts