When the agent started deleting things: a postmortem.
We gave a tool-using agent filesystem access. It did what we asked. Then it did more. Here's what we learned about blast-radius design.
Essays, post-mortems, and technical notes from the Brainscend team.
Retrieval-augmented generation has a hallucination problem people keep reframing as a retrieval problem. The real issue is upstream, and it's solvable without swapping models.
We gave a tool-using agent filesystem access. It did what we asked. Then it did more. Here's what we learned about blast-radius design.
You don't need an AI strategy. You need three bets, six months, and a clear way to kill the ones that don't work.
We stopped treating eval dashboards as an internal thing. Clients who see them trust the work more, and negotiate less.
A vision pipeline, eleven weeks, and the unglamorous work of labeling 14,000 frames by hand.
For most production tasks, a 7B fine-tuned model beats a frontier model on cost, latency, and reliability. Here's when to choose which.
A counterintuitive finding: for our support-copilot workload, caching created drift and hurt accuracy. Spending more on model calls saved on engineer hours.
A schema change looked safe. It wasn't. The rollback was fine; the communication was the real lesson.
Every junior engineer I work with wants to design the system first. The senior move is to design the logs first.
How we consolidated a logistics client's reporting surface and gave a reluctant COO something she'd actually open.
We version-control prompts, review them in PRs, and write tests. It sounds excessive until the first time it saves you.
Most AI pilots never ship. Here's the pattern we see, and the three questions we now ask on call one to avoid joining the pile.
We optimized the wrong thing for four weeks. The fix, embarrassingly, was just hiring more labelers.