Growth Metrics That Actually Matter for AI Products
If you’re building an AI product, stop looking at your DAU (Daily Active Users) for a second. It might be lying to you. 🫢
In normal SaaS, we’re obsessed with engagement. More clicks, more sessions, more time in the app. If a user spends 30 minutes in your dashboard, they’re "engaged."
But in AI? If a user spends 30 minutes chatting with your bot, they might just be frustrated. 🤣
They might be stuck in a hallucination loop, trying to get the right answer and failing. In the AI world, "high engagement" can actually be a signal of a broken product.
Here’s what we learned at BauGPT about the metrics that actually matter when you’re building with LLMs.
The "Chatty User" Trap
When we first launched our construction AI, we were stoked. People were sending dozens of messages. They were "chatting."
Then we looked at the transcripts.
A lot of those messages were: "No, that's not right," "Try again," or "Where did you get that number?"
The user was working hard, but the AI wasn't. We were tracking "Engagement," but we weren't tracking "Value."
We realized we needed a new set of metrics that reflect the unique nature of AI. Metrics that track outcomes, not just activity.
1. Job Completion Rate (JCR)
This is our North Star.
In construction, nobody opens BauGPT to "chat." They open it to find a specific DIN standard, calculate a material cost, or verify a building code.
They have a job to do.
We track JCR by looking at the end of a session. Did the user export the result? Did they copy the text? Did they give a thumbs up? Or did they just close the tab after 5 failed attempts?
If your JCR is low, it doesn't matter how many "active users" you have. You’re just a fancy toy that doesn't work. 😎
2. Tokens-to-Value Ratio
This is my favorite "nerd" metric. 🤓
Every token costs money. If it takes 5,000 tokens of "chatting" to get a simple answer that should have taken 200 tokens, your product is inefficient.
We try to track how many tokens it takes to reach a "Value Event" (like a successful document extraction).
If this ratio is going up, your prompts are getting bloated or your users are struggling to communicate with the model. We use this to prune our system prompts and keep the AI focused.
3. Human-in-the-loop Correction Rate
Since we deal with construction documents, accuracy is everything. 🏗️
We have a feature where users can correct the AI if it misses a detail in a floor plan or an invoice. We track the percentage of extractions that require a manual fix.
- High correction rate = Your RAG or OCR is failing.
- Low correction rate = You’re actually saving people time.
If this number isn't going down week-over-week, your product isn't learning.
How we track this (The Tech Bit)
We use a mix of Amplitude for the high-level stuff and custom logs in our Postgres database for the LLM specifics.
Every completion gets logged with:
- The prompt version used
- The model (Opus vs Sonnet)
- The user feedback (thumbs up/down)
- Whether a "Copy" or "Export" happened afterwards
It’s not enough to know that they used it. You need to know why they used it and if it worked.
The Result: Higher Retention, Less Noise
Once we stopped optimizing for "chatting" and started optimizing for "Value Events," something cool happened.
Our session lengths actually decreased. Users were in and out faster.
But our retention went up. 🚀
Because they realized they could rely on BauGPT to get the job done in 2 minutes instead of 20.
Takeaway
If you’re building an AI startup, don’t get blinded by standard SaaS vanity metrics.
- Track Outcomes, not clicks.
- Track Success, not sessions.
- Track Value, not chat.
The best AI products are the ones that disappear because they work so well. ✌️
LG Jonas