Watched a batch of agent traces a few weeks back. The model kept reaching for the wrong tool. Same wrong one, over and over.
The description wasn't wrong. That was the odd part. It said exactly what the tool did. 'Returns the status of an order.' Accurate. True. Useless.
The model wasn't asking what does this do. It was asking which of these eleven things do I pick, right now, for what the user wants. Nothing in the description answered that.
A person skims a tool name and a one-line summary and fills the rest in from what they already know. The model doesn't have what you already know. It has the words you gave it. And the words you gave it were written for someone who'd understood the product first.
So we rewrote that one description. Same tool, but it now says when to use it, not just what it returns. Use this when the user asks where their order is. Returns status, carrier and ETA for one order by ID.
The wrong-tool loop mostly went away. Same code underneath. We changed about forty words.
It stays invisible until you line up the traces and look at what the model did instead of what you'd hoped. One bad pick looks like the model being thick. A thousand bad picks is a pattern, and the pattern points back at a sentence you wrote months ago and forgot.
Tool descriptions are an API surface now. Something reads them literally and has no idea what you meant. Worth writing them like you know that.
If you build a surface that agents use, and you want to know how they actually use it, Vesta is in early access.