skip to main content

Extraction-accuracy benchmark

Syncanix turns your API into agent capabilities by reading your source code. That only works if extraction is accurate — so we measure it, publish the results, and test-pin every number on this page to the committed eval harness. If an extractor regresses, the build fails before this page can overstate anything.

Headline results

100.0%macro recall across the graded real repos
100.0%macro precision across the graded real repos
12real public repos graded against hand-derived ground truth
29frameworks scanned end-to-end on real repos

Last verified: 2026-06-11 · 9c09e59f2

Methodology

  • Hand-derived ground truth

    For each graded repo we read its route definitions by hand and list every real endpoint as a method + path pair. The extractor is scored against that list: recall is the share of real endpoints it found; precision is the share of extracted endpoints that are real.

  • Structural matching

    A match means the HTTP method plus the full composed path — mounted prefixes included. Path parameters are compared by structure, not by name: :id and {slug} in the same position are equal.

  • Static analysis only

    Discovery reads source code; it never executes your app, inspects your traffic, or calls your endpoints. Every number here was produced by the same detect-and-extract path that npx syncanix init runs.

  • Tested in CI, pinned by tests

    The deterministic fixture suite gates CI on every extractor change, and the numbers on this page are pinned to the harness output by tests — a regression fails the build before it can ship.

Real public repos, graded

Twelve reference implementations — mostly RealWorld apps — with hand-derived ground truth. The repos are public: you can read the same route files we did.

Deterministic fixture suite

Committed fixture projects per framework, scored on F1 against per-framework thresholds. This is the suite that gates CI on every extractor change.

FrameworkFixturesF1Gate threshold
nestjs51.0000.92
express31.0000.85
fastapi81.0000.92
nextjs81.0000.92
graphql51.0001.00
grpc11.0000.92
trpc21.0000.92
websocket21.0000.85
springboot11.0000.85
phoenix11.0000.85
gin21.0000.85
actix11.0000.85
axum21.0000.85
laravel11.0000.85
aspnet11.0000.85
vapor11.0000.75
play11.0000.75
compojure11.0000.75
dream11.0000.75
servant11.0000.75
cowboy11.0000.75
plumber11.0000.75
lapis11.0000.75

Frameworks covered by the graded real repos rather than synthetic fixtures: django, flask, rails

Vendored real-world fixtures

Complete real repositories vendored at a pinned commit and scored against a spec-derived or hand-labelled oracle. The gated metric is structural F1 (method + path).

FrameworkFixtureStructural F1Ground truth
FastAPIfastapi-realworld@029eb771.000openapi
Honohono-open-api-starter@0d5f3bf1.000hand-labelled
Symfonysymfony-realworld@5ad39de1.000openapi

What these numbers do not claim

  • The graded set is curated reference apps, not a random sample of all codebases. Your repo can differ — which is why the CLI writes a reviewable catalog instead of asking for trust.
  • This benchmark measures structural extraction (methods and paths). The quality of the LLM-written capability descriptions is evaluated separately and is not part of these figures.
  • Auth-requirement labelling is intentionally conservative in some extractors and can disagree with a spec-derived oracle on public endpoints; the gated metric is structural accuracy.

Verify it yourself

The most meaningful check is your own codebase: run the discovery CLI and review the catalog it writes — every extracted capability cites its source location, so you can diff the catalog against your route files in minutes. For transparency, these are the internal harness commands behind the numbers above (the extractors are source-available; the eval corpus is not public):

pnpm --filter syncanix test:f1
pnpm --filter syncanix scan:real

Framework support lists every framework and language the CLI reads — and what to do if yours is not covered.