Performance

Anvil removes wiring work without becoming the slow part of your backend. Performance testing is split into two layers: engine benchmarks that isolate Anvil itself, and sample-backend benchmarks that exercise generated code under a real application shape.

For current HTTP driver measurements, see Benchmarks.

What To Measure

Use separate numbers for separate questions.

Question	Benchmark Layer	Why It Matters
How much does Anvil’s edge dispatcher add?	`core/edge`	Measures protocol classification before a driver sees the request.
How expensive is boot-time wiring?	Sample backend wiring benchmark	Measures DI provider registration and generated component resolution before traffic starts.
How expensive is generated HTTP glue?	Sample backend direct benchmarks	Measures middleware, binding, validation, error mapping, and handler calls without network noise.
How does the full stack behave under concurrency?	Sample backend live HTTP/2 benchmarks	Measures Anvil edge, driver, generated routes, Go HTTP/2, and client/server scheduling.
What does error handling cost?	Error pipeline and sample error benchmarks	Measures expected failures, internal errors, panic recovery, and global error observers.

Keep these numbers separate. A direct handler benchmark and a live HTTP/2 benchmark answer different questions.

Core Engine Benchmarks

Run the core Anvil benchmarks from the anvil repository:

go test ./core/... ./cmd/anvil/... ./testbed \
  -run '^$' \
  -bench . \
  -benchmem \
  -count 10 \
  -benchtime 3s

Useful focused runs:

go test ./core/edge -run '^$' -bench BenchmarkEdgeDispatch -benchmem -count 10 -benchtime 3s
go test ./core/di -run '^$' -bench BenchmarkInjector -benchmem -count 10 -benchtime 3s
go test ./core/errors -run '^$' -bench BenchmarkPipeline -benchmem -count 10 -benchtime 3s
go test ./core/events -run '^$' -bench BenchmarkBusPublish -benchmem -count 10 -benchtime 3s
go test ./core/bind ./core/validate -run '^$' -bench . -benchmem -count 10 -benchtime 3s

BenchmarkEdgeDispatch includes these cases:

Direct handler call
HTTP/1 fallback dispatch
HTTP/2 REST dispatch
GraphQL path dispatch
Dispatch by gRPC content type
Parallel HTTP/2, GraphQL, and gRPC dispatch

The edge benchmark uses an in-memory response writer and no network I/O. Its job is to expose Anvil’s classification overhead, not total server throughput.

Golden Backend Benchmarks

Run the realistic generated-backend benchmarks from the sample backend:

go test ./internal/smoke \
  -run '^$' \
  -bench BenchmarkGeneratedBackend \
  -benchmem \
  -count 10 \
  -benchtime 3s

The sample backend benchmarks cover:

BenchmarkGeneratedBackendWiring: Boot-time generated wiring.
BenchmarkGeneratedBackendDirect/health: A small generated HTTP route.
BenchmarkGeneratedBackendDirect/project_get_with_middleware: Route params, locals, middleware, DI, and domain lookup.
BenchmarkGeneratedBackendDirect/project_create_duplicate_slug_error: JSON decode, generated validation, handler call, domain error mapping, and JSON error response.
BenchmarkGeneratedBackendDirect/validation_failure: Generated validation and expected error response mapping.
BenchmarkGeneratedBackendDirect/domain_error_with_plugin_mapper: Domain error mapping through a plugin-provided mapper.
BenchmarkGeneratedBackendDirect/panic_recovery_error_pipeline: Panic recovery, internal failure mapping, and global error observers.
BenchmarkGeneratedBackendLiveHTTP2Parallel: Real HTTP/2 clients against one Anvil listener, including REST, GraphQL, and gRPC.

The live benchmark reports req/s as an extra metric. Treat it as a local machine capacity signal, not as a universal promise.

Saturation Runs

Benchmarks are good for regressions. They are not enough for release claims. Before publishing alpha performance numbers, run the sample backend as a normal process and use the stress runner from a second process:

go run ./cmd/server

go run ./cmd/stress \
  -url http://127.0.0.1:8080 \
  -duration 10s \
  -mode steady \
  -start 8 \
  -max 512 \
  -stop-error-rate 0.01 \
  -json stress-report.json

The runner seeds one project, then ramps every selected protocol at the same time. By default it runs REST, GraphQL, gRPC, and WebSocket workers together. The -duration flag is the total wall-clock budget for the run. The runner splits that budget across the concurrency rounds between -start and -max.

Use -mode steady for normal throughput testing. It warms workers before each measured round, keeps reusable HTTP/gRPC clients alive, and keeps one WebSocket open per WebSocket worker.

Use -mode connect-storm when you want to measure fresh connection pressure. In that mode REST and GraphQL disable HTTP keep-alives, gRPC creates a fresh client connection per call, and WebSocket workers reconnect for every message. Those numbers answer a different question than steady-state throughput.

Connect storm mode is bounded by -connect-rate 250 by default. The limit is global across all selected protocols and exists so the client does not burn through its own ephemeral port range before the server is under useful pressure. Use -connect-rate 0 for an intentionally unbounded connection flood, or raise the value gradually when you are looking for the server or OS limit.

Each round reports:

Concurrency per protocol
Total request/message rate
Per-protocol request/message rate
Error count and error rate
Error breakdown by dial, deadline, status, assertion, protocol, and other
Latency percentiles: p50, p95, p99, and max
Server-side connection counters when the sample backend exposes them

WebSocket workers keep one socket open per worker and measure message round trips. REST and GraphQL numbers are HTTP requests. gRPC numbers are unary calls.

The server connection line reports accepted, closed, and hijacked connection deltas plus the final active, idle, non-hijacked open, peak active, and peak open counts observed through http.Server.ConnState. WebSocket upgrades are reported as hijacked because Go hands those sockets to the WebSocket driver.

Use protocol selection when you want an isolated ceiling:

go run ./cmd/stress -url http://127.0.0.1:8080 -protocols grpc -duration 10s -start 16 -max 1024
go run ./cmd/stress -url http://127.0.0.1:8080 -protocols rest,graphql -duration 10s -start 16 -max 1024
go run ./cmd/stress -url http://127.0.0.1:8080 -mode connect-storm -duration 10s -start 8 -max 256 -connect-rate 500

Use the mixed run when you want to test Anvil’s same-port dispatcher under realistic pressure.

Live Profiles

Use live profiles when a stress run shows a real limit and you need to know where the time, allocations, or blocking are coming from. The golden sample backend keeps this tooling outside core Anvil. Anvil does not expose pprof or profiling symbols in the SDK.

Start the sample backend with its local pprof server enabled:

PPROF_ADDR=127.0.0.1:6060 go run ./cmd/server

Then run the profiling harness from another shell:

go run ./cmd/profile \
  -url http://127.0.0.1:8080 \
  -pprof-url http://127.0.0.1:6060 \
  -duration 30s \
  -start 8 \
  -max 512 \
  -out profiles/local-run

The harness runs the same mixed-protocol stress workload and captures:

CPU profile during the primary load run
Heap profile after the primary load run
Allocs profile after the primary load run
Goroutine profile after the primary load run
Mutex profile after the primary load run
Block profile after the primary load run
Runtime trace during a short follow-up load run
Stress JSON for both the CPU-profiled run and the trace run

Open the artifacts with the standard Go tools:

go tool pprof profiles/local-run/cpu.pprof
go tool pprof -alloc_objects profiles/local-run/allocs.pprof
go tool pprof profiles/local-run/mutex.pprof
go tool pprof profiles/local-run/block.pprof
go tool trace profiles/local-run/trace.trace

On Windows PowerShell, set the pprof address like this:

$env:PPROF_ADDR = "127.0.0.1:6060"
go run ./cmd/server

Comparing Changes

Use benchstat when comparing two revisions:

go test ./core/edge -run '^$' -bench BenchmarkEdgeDispatch -benchmem -count 10 -benchtime 3s > old.txt

# Change code, then run again.
go test ./core/edge -run '^$' -bench BenchmarkEdgeDispatch -benchmem -count 10 -benchtime 3s > new.txt

benchstat old.txt new.txt

Only claim an improvement when benchstat shows a statistically meaningful change. Small benchmark movement can come from CPU scheduling, turbo behavior, background processes, or thermal state.

Profiling

Use profiles when a benchmark exposes a hot path.

CPU profile:

go test ./internal/smoke \
  -run '^$' \
  -bench BenchmarkGeneratedBackendDirect/project_get_with_middleware \
  -benchtime 10s \
  -cpuprofile cpu.prof

go tool pprof cpu.prof

Allocation profile:

go test ./internal/smoke \
  -run '^$' \
  -bench BenchmarkGeneratedBackendDirect/project_get_with_middleware \
  -benchtime 10s \
  -memprofile mem.prof

go tool pprof -alloc_objects mem.prof

Trace:

go test ./internal/smoke \
  -run '^$' \
  -bench BenchmarkGeneratedBackendLiveHTTP2Parallel/rest_project_get \
  -benchtime 10s \
  -trace trace.out

go tool trace trace.out

Use profiles to decide what to optimize. Guesswork is how performance work gets expensive without improving the system.

Release Numbers

Produce public performance numbers from a clean run:

Fixed Go version
Fixed commit hashes for Anvil, drivers, and the sample backend
Fresh process with no dev server or browser noise
-count 10 benchmark output stored with the release notes
Hardware, operating system, CPU, and memory listed beside the results
benchstat output when comparing against a previous release

The documentation can show representative alpha numbers once they come from a repeatable release machine. Until then, local benchmark output is useful for engineering decisions but is not a public guarantee.