Skills are a leaky abstraction

Quick preface: I don’t think skills are bad or useless. I just think there are certain environments where you can achieve what skills aim for far more reliably with different tools. Skills are the hammer that makes you believe everything is a nail and before long you don’t realise you’re pounding at screws.

The goal of skills

We should establish what skills are actually trying to achieve first, so we can judge the proposed alternative fairly.

Progressive disclosure

This term gets thrown around a lot and is probably the main selling point of skills. The idea is that skills allow a model to conditionally pull in useful instructions without having everything crammed into the main system prompt, avoiding what the field has decided to call “context rot” (we really need to pick a lane on whether we want serious or unserious sounding lingo for this AI thing).

Progressive disclosure itself isn’t a crazy breakthrough. If you’ve ever had to interrupt your agent after it tried every single system Python install, just to tell it “please use uv, thank you!” - congratulations, you’ve done a progressive disclosure.

What skills offer is automating that disclosure. You write markdown files containing instructions, paired with a description of when each one should be invoked. Those descriptions get inserted into the system prompt, and the agent can call your skill files much like it would call tools, enriching its own context on demand.

The actual goal of skills

The real goal of skills isn’t progressive disclosure. It’s automating progressive disclosure. That distinction matters, because it’s where the cracks start to show.

The aforementioned cracks

The myth of “human readable”

One frequently praised upside of skills is that they’re just plain markdown files, human readable by design. I think this is only half true. In practice they’re human writable and LLM readable.

Here’s why. If you’ve ever looked at a production skill file, you’ll quickly find that they’re long and explicit. They have to be. They’re written as catch-alls for a wide set of problems, and for them to fulfil the “progressive” part of progressive disclosure, their content must be substantially longer than their description, otherwise you’d just put everything in the system prompt and you are back to square one.

That length is fine for LLM agents. With better context compaction and larger context windows, you can load in a huge amount of instructions and the model handles it well. But a human reading the same file will likely miss things, or just give up somewhere around line 300.

The myth of disclosure

The second problem is that you’re entirely at the mercy of your agent deciding to pull in your context. The whole system depends on the agent reaching out for extra context, not you pushing it in, like in the uv example. If the agent decides not to read your skill, you’ve lost the “disclosure” part of progressive disclosure.

In my experience this tends to happen when you’re trying to override behaviour the agent feels confident about. I use jj instead of git, and getting agents to reach for my jj skill file before firing off git commands is genuinely difficult. It’s such a common, foundational tool that the agent rarely feels uncertain enough to go looking for guidance.

The myth of progressive

Have you ever seen your agent think: “all these skill files seem somewhat related to what I’m doing. Let me just read all of them”? If yes, congratulations, you’ve now lost the “progressive” part too.

To be fair, this is becoming less common as models improve. But I wouldn’t call it a solved problem.

The solution

This isn’t politics, so I’m not going to complain without offering an alternative. Here it is (drumroll) linters.

If your skills are primarily about code patterns and conventions, linters are probably the better tool. They’re truly progressive: they only report findings that are actually present in your code, and they fire deterministically without you or the AI second-guessing whether a diagnostic is probably irrelevant. They also help non-AI readers. Instead of dumping 2000 lines of markdown at you, they annotate bad patterns directly in context, giving you specific and actionable feedback.

And if you wrap it in an LSP harness you can get even more superpowers. Many harnesses support LSPs natively and get their feedback back after every file write. That means that every single time your agent writes a bad code pattern it get’s alerted about it’s mistake, probably with a suggestion on how to do better. Automatic progressive disclosure!

Yes, writing a linter is harder than writing a markdown file. But for common languages and style guides, something useful probably already exists. And if you’re building a large library and want to improve the experience for both agents and human contributors, writing a custom linter might genuinely be worth the investment.

TLDR

I put this at the bottom so it’s more of a “too long, did read” - but that’s fine.

Skills are a leaky abstraction. That’s acceptable in areas where natural language is the only tool available. Not everything can be caught by static analysis. But a lot of things can be, and for those, there are probably better tools for the job.