Extraction-accuracy benchmark

Syncanix turns your API into agent capabilities by reading your source code. That only works if extraction is accurate — so we measure it, publish the results, and test-pin every number on this page to the committed eval harness. If an extractor regresses, the build fails before this page can overstate anything.

100.0%macro recall across the graded real repos

100.0%macro precision across the graded real repos

12real public repos graded against hand-derived ground truth

29frameworks scanned end-to-end on real repos

Last verified: 2026-06-11 · 9c09e59f2

Methodology

Hand-derived ground truth
For each graded repo we read its route definitions by hand and list every real endpoint as a method + path pair. The extractor is scored against that list: recall is the share of real endpoints it found; precision is the share of extracted endpoints that are real.
Structural matching
A match means the HTTP method plus the full composed path — mounted prefixes included. Path parameters are compared by structure, not by name: :id and {slug} in the same position are equal.
Static analysis only
Discovery reads source code; it never executes your app, inspects your traffic, or calls your endpoints. Every number here was produced by the same detect-and-extract path that npx syncanix init runs.
Tested in CI, pinned by tests
The deterministic fixture suite gates CI on every extractor change, and the numbers on this page are pinned to the harness output by tests — a regression fails the build before it can ship.

Real public repos, graded

Twelve reference implementations — mostly RealWorld apps — with hand-derived ground truth. The repos are public: you can read the same route files we did.

Framework	Repository	True endpoints	Recall	Precision
Express	gothinkster/node-express-realworld-example-app	20	100%	100%
NestJS	lujakob/nestjs-realworld-example-app	21	100%	100%
FastAPI	nsidnev/fastapi-realworld-example-app	19	100%	100%
Flask	gothinkster/flask-realworld-example-app	19	100%	100%
Django	gothinkster/django-realworld-example-app	23	100%	100%
Rails	gothinkster/rails-realworld-example-app	20	100%	100%
Laravel	f1amy/laravel-realworld-example-app	19	100%	100%
Gin	gothinkster/golang-gin-realworld-example-app	20	100%	100%
Spring Boot	gothinkster/spring-boot-realworld-example-app	19	100%	100%
Actix	snamiki1212/realworld-v1-rust-actix-web-diesel	20	100%	100%
ASP.NET Core	gothinkster/aspnetcore-realworld-example-app	19	100%	100%
GraphQL	howtographql/graphql-js	8	100%	100%

Deterministic fixture suite

Committed fixture projects per framework, scored on F1 against per-framework thresholds. This is the suite that gates CI on every extractor change.

Framework	Fixtures	F1	Gate threshold
nestjs	5	1.000	0.92
express	3	1.000	0.85
fastapi	8	1.000	0.92
nextjs	8	1.000	0.92
graphql	5	1.000	1.00
grpc	1	1.000	0.92
trpc	2	1.000	0.92
websocket	2	1.000	0.85
springboot	1	1.000	0.85
phoenix	1	1.000	0.85
gin	2	1.000	0.85
actix	1	1.000	0.85
axum	2	1.000	0.85
laravel	1	1.000	0.85
aspnet	1	1.000	0.85
vapor	1	1.000	0.75
play	1	1.000	0.75
compojure	1	1.000	0.75
dream	1	1.000	0.75
servant	1	1.000	0.75
cowboy	1	1.000	0.75
plumber	1	1.000	0.75
lapis	1	1.000	0.75

Frameworks covered by the graded real repos rather than synthetic fixtures: django, flask, rails

Vendored real-world fixtures

Complete real repositories vendored at a pinned commit and scored against a spec-derived or hand-labelled oracle. The gated metric is structural F1 (method + path).

Framework	Fixture	Structural F1	Ground truth
FastAPI	fastapi-realworld@029eb77	1.000	openapi
Hono	hono-open-api-starter@0d5f3bf	1.000	hand-labelled
Symfony	symfony-realworld@5ad39de	1.000	openapi

What these numbers do not claim

The graded set is curated reference apps, not a random sample of all codebases. Your repo can differ — which is why the CLI writes a reviewable catalog instead of asking for trust.
This benchmark measures structural extraction (methods and paths). The quality of the LLM-written capability descriptions is evaluated separately and is not part of these figures.
Auth-requirement labelling is intentionally conservative in some extractors and can disagree with a spec-derived oracle on public endpoints; the gated metric is structural accuracy.

Verify it yourself

The most meaningful check is your own codebase: run the discovery CLI and review the catalog it writes — every extracted capability cites its source location, so you can diff the catalog against your route files in minutes. For transparency, these are the internal harness commands behind the numbers above (the extractors are source-available; the eval corpus is not public):

pnpm --filter syncanix test:f1
pnpm --filter syncanix scan:real

Framework support lists every framework and language the CLI reads — and what to do if yours is not covered.