
Yesterday, I saved my friend from AI.
We were analyzing a workplace survey. 30+ questions, two groups of people. He was presenting it to 200 people at work the next day.
We dropped the data into an LLM, asked it to run the analysis and build a presentation. Minutes later: done. Polished. Convincing.
Then we got lucky.
A random spot-check on one answer didn’t add up. The AI had quietly subtracted 1 from an entire category — for no reason we could find.
One silent error we only caught by chance.
And that was the scary part. If it changed that number, what else did it touch?
We didn’t trust any of it anymore. So we re-ran the whole analysis by hand — an hour of work the AI was supposed to save us.
Here’s the question I can’t shake.
If a non-deterministic tool can’t be trusted with a simple survey, what happens at enterprise scale — where these tools are spreading fast?
How do we trust them?
And if we can’t trust them with something this small, what do we do when they’re running everything?
I don’t have a clean answer today.
Sitting with it,
Kirill
P.S. The presentation looked perfect. That’s exactly why we almost shipped it. The better these tools become at looking right, the harder this question gets.
