BauGPT processes 40,000 WhatsApp messages a week. About 40% of them are voice notes.
I mention this not to flex on a number. I mention it because it explains every product decision we made in the last six months.
Where construction workers actually live
When we started building BauGPT, the default assumption was: build a mobile app. That is what AI products do. A clean interface, a chat window, maybe a document viewer on the side. Something you open, type into, close.
The problem was that our users do not type. Their hands are dirty. Gloves on. They are on a scaffold with the sun in their eyes, trying to figure out whether a particular joint detail counts as a deviation under VOB Part B.
And more importantly: they are already in WhatsApp. A German construction site worker spends six to eight hours a day in that app. It is where the site team coordinates. Where the foreman sends updates. Where the subcontractor sends photos of the problem. WhatsApp is the CRM, the project management tool, and the communication layer for most of the construction industry, and nobody made a deliberate decision to use it that way. It just happened.
We decided not to fight that.
What we actually built
The architecture is deliberately boring.
A Twilio webhook receives inbound messages. Text or voice. Voice notes get transcribed with Whisper before anything else happens. We tried asynchronous transcription at first to save cost and it was a mistake: users got responses that did not track with what they had asked because we were answering the text query before the transcription came back. Synchronous Whisper, always.
Claude generates the answer. BauGPT's RAG layer provides the citations: which section of the standard, which clause of the contract, which page of the spec. The citation is in the answer. Not a link to a portal. Not a "click here to see the source." The answer says: "This falls under DIN 18195, Section 5.3, which specifies..."
The output stays inside WhatsApp. No redirect. No "to continue this conversation, please log in." They asked, we answered, they moved on.
What surprised us
The metric that caught us off guard was follow-up rate. About 12% of conversations result in a second question. That sounds small. It is not. It means the first answer was specific enough that the user trusted it enough to ask another. In early versions, when answers were vague or citations wrong, the follow-up rate was near zero. People just stopped.
Response times: 30-second median on text questions. 90 seconds on voice. Transcription is the bottleneck. We looked at batching to reduce cost. Each time we modeled it, the latency impact killed the value proposition. Construction workers ask questions mid-task. They want the answer before they have to go find someone to tell them what to do. Ninety seconds is already pushing it.
The lesson we keep coming back to
Industries with deep existing channel preferences do not want a new tool. They want the answer inside the tool they already have.
The marginal cost of using a new app is not zero. For a construction worker, it is enormous: new login, new notification to manage, something to remember to open, something that might not work on a site with bad signal, something their foreman does not use.
WhatsApp has none of those problems. The decision to build on it felt risky at the time. Every investor asked about the dependency. In hindsight, it was the only sensible option.
What channel are your users already in that you are pretending they are not in?