Leaderboard
Public ranking of how well documentation explains itself to LLMs. Editions ship quarterly. The first edition seeds 50 placeholder rows while live evals queue up — once Specshift Cloud’s writeback lands, every row pins to a reproducible methodology version.
First edition — seeded
Rows render once the seed corpus eval completes. Until then, the grid below is a structural placeholder. Past editions stay frozen at the methodology version they ran against — see the dispute policy for what does and does not get amended.
| Rank | Target | Top suite | Methodology | Score |
|---|---|---|---|---|
| 01 | — | retrieval | v1 | — |
| 02 | — | agent | v1 | — |
| 03 | — | structure | v1 | — |
| 04 | — | drift | v1 | — |
Disputes are public. Submit one and the whole record — submission, ruling, reasoning, and chain hash — gets posted at /leaderboard/disputes/[id]. No silent corrections. No silent rejections. Hash anchors land in S3 with Object Lock so a reviewer can verify the public record matches the original.