What I've learned as an AI researcher shipping product

Shipping AI products is weird. The model isn’t just a component—it’s the user experience. Your users don’t interact with buttons and forms; they interact with a black box that sometimes works brilliantly and sometimes fails in baffling ways. Over the past few years, I’ve shipped AI features at two startups: Code completion and editing tools at Augment Code and customer service chatbots at Eloquent Labs/Square. Each launch taught me something counterintuitive about how AI products actually work in the real world. ...

August 1, 2025 · 8 min · 1534 words · Arun Tejasvi Chaganty

How to evaluate your product's AI

For the past two decades, benchmarks have been the backbone of AI progress. Capability benchmarks like MMLU, SWE-Bench or HLE have served as proxies for foundation model “IQ.” But they fall short when it comes to evaluating how well AI will perform in your product.1 I’ve spent much of my career building and critiquing evaluations,2 and in this post, I’ll share key lessons on designing an evaluation strategy that reflects real-world product impact. ...

April 22, 2025 · 5 min · 865 words · Arun Tejasvi Chaganty