The Drip
The leadership shuffle continued this week — Andrej Karpathy joined Anthropic, and Anthropic picked up Stainless (the tool that turns API docs into MCP servers and CLIs) for a reported $300M. Everything in this space gets bought at a premium right now.
What we keep coming back to: Anthropic and OpenAI are AI companies. The old monoliths — Microsoft, Apple, Meta — are "AI-injected." Google is the interesting one. It's the only traditional peer with all the ingredients: a frontier model, an application stack, Android, and cloud. The race isn't just about the model anymore. It's about who builds the best experience around it.
Inside The Bottle
We've been heads-down building AI Pathfinder — our guided tool for figuring out where AI actually fits in your work — and this episode is a look under the hood. Specifically: what it takes to make an AI conversation reliable.
The big lesson is one we've said before. The model is never the full picture. You can wire a great model to an API, tell it "here's the five-step process, go," and it will drift, hallucinate, and exhaust your users. Getting to a reliable experience is almost entirely the stuff that isn't the model.
Here are the seven pieces we leaned on:
| ■ | State machine. Don't make the LLM track where it is in a process. Give it a "boss" that knows the user's place in the journey and tells the model what to do next. |
| ■ | Turn contract. Strict rules baked into every prompt: one question at a time, keep it concise. When AI throws four questions at you at once, people bail. |
| ■ | Write notes to a database. Instead of hoping the model remembers the whole conversation, it writes notes down as it goes — like a consultant. That record is what lets a user pause, leave, and rehydrate the session later. |
| ■ | Quality evaluators. A buffer that QAs the AI's output before it reaches the user. No blank messages, no double questions, no re-asking something already answered. |
| ■ | Second opinion. A separate, narrow AI grades whether there's enough to move on. A neutral check that keeps the main model from cherry-picking its own assumptions. |
| ■ | Trust the data, not the prose. Summaries shown to the user are built from what was captured, not the AI's memory — then confirmed with a simple yes/no. Quiet, fine-grained control the user barely notices. |
| ■ | Escape hatch. LLMs love to go deeper. Sometimes you need a way to say "you've gathered enough, move on" so nobody gets stuck in a loop. |
If you're building anything where a person talks to an AI and you need a dependable result, each of these is worth stealing.
Lab Notes
| ■ | Justin's note: Build an escape hatch into any chat experience by default. LLMs love to keep drilling — "let's really dig on this problem." Sometimes the right move is the system stepping in to say you've gathered enough, time to move on. |
| ■ | Kellan's note: A targeted, context-clean third-party agent to validate work is one of the most useful patterns I've found. Less cherry-picking, more neutral, more binary. It's the same reason an "/advisor" check works so well. |
What Stopped Our Scroll
| ■ | Introducing Gemini Omni — Create anything from any input and edit it naturally with conversational language. Google keeps showing it has all the ingredients. |
|