Performance
Anvil removes wiring work without becoming the slow part of your backend. Performance testing is split into two layers: engine benchmarks that isolate Anvil itself, and sample-backend benchmarks that exercise generated code under a real application shape.
For current HTTP driver measurements, see Benchmarks.
What To Measure
Section titled “What To Measure”Use separate numbers for separate questions.
| Question | Benchmark Layer | Why It Matters |
|---|---|---|
| How much does Anvil’s edge dispatcher add? | core/edge | Measures protocol classification before a driver sees the request. |
| How expensive is boot-time wiring? | Sample backend wiring benchmark | Measures DI provider registration and generated component resolution before traffic starts. |
| How expensive is generated HTTP glue? | Sample backend direct benchmarks | Measures middleware, binding, validation, error mapping, and handler calls without network noise. |
| How does the full stack behave under concurrency? | Sample backend live HTTP/2 benchmarks | Measures Anvil edge, driver, generated routes, Go HTTP/2, and client/server scheduling. |
| What does error handling cost? | Error pipeline and sample error benchmarks | Measures expected failures, internal errors, panic recovery, and global error observers. |
Keep these numbers separate. A direct handler benchmark and a live HTTP/2 benchmark answer different questions.
Core Engine Benchmarks
Section titled “Core Engine Benchmarks”Run the core Anvil benchmarks from the anvil repository:
go test ./core/... ./cmd/anvil/... ./testbed \ -run '^$' \ -bench . \ -benchmem \ -count 10 \ -benchtime 3sUseful focused runs:
go test ./core/edge -run '^$' -bench BenchmarkEdgeDispatch -benchmem -count 10 -benchtime 3sgo test ./core/di -run '^$' -bench BenchmarkInjector -benchmem -count 10 -benchtime 3sgo test ./core/errors -run '^$' -bench BenchmarkPipeline -benchmem -count 10 -benchtime 3sgo test ./core/events -run '^$' -bench BenchmarkBusPublish -benchmem -count 10 -benchtime 3sgo test ./core/bind ./core/validate -run '^$' -bench . -benchmem -count 10 -benchtime 3sBenchmarkEdgeDispatch includes these cases:
- Direct handler call
- HTTP/1 fallback dispatch
- HTTP/2 REST dispatch
- GraphQL path dispatch
- Dispatch by gRPC content type
- Parallel HTTP/2, GraphQL, and gRPC dispatch
The edge benchmark uses an in-memory response writer and no network I/O. Its job is to expose Anvil’s classification overhead, not total server throughput.
Golden Backend Benchmarks
Section titled “Golden Backend Benchmarks”Run the realistic generated-backend benchmarks from the sample backend:
go test ./internal/smoke \ -run '^$' \ -bench BenchmarkGeneratedBackend \ -benchmem \ -count 10 \ -benchtime 3sThe sample backend benchmarks cover:
BenchmarkGeneratedBackendWiring: Boot-time generated wiring.BenchmarkGeneratedBackendDirect/health: A small generated HTTP route.BenchmarkGeneratedBackendDirect/project_get_with_middleware: Route params, locals, middleware, DI, and domain lookup.BenchmarkGeneratedBackendDirect/project_create_duplicate_slug_error: JSON decode, generated validation, handler call, domain error mapping, and JSON error response.BenchmarkGeneratedBackendDirect/validation_failure: Generated validation and expected error response mapping.BenchmarkGeneratedBackendDirect/domain_error_with_plugin_mapper: Domain error mapping through a plugin-provided mapper.BenchmarkGeneratedBackendDirect/panic_recovery_error_pipeline: Panic recovery, internal failure mapping, and global error observers.BenchmarkGeneratedBackendLiveHTTP2Parallel: Real HTTP/2 clients against one Anvil listener, including REST, GraphQL, and gRPC.
The live benchmark reports req/s as an extra metric. Treat it as a local
machine capacity signal, not as a universal promise.
Saturation Runs
Section titled “Saturation Runs”Benchmarks are good for regressions. They are not enough for release claims. Before publishing alpha performance numbers, run the sample backend as a normal process and use the stress runner from a second process:
go run ./cmd/servergo run ./cmd/stress \ -url http://127.0.0.1:8080 \ -duration 10s \ -mode steady \ -start 8 \ -max 512 \ -stop-error-rate 0.01 \ -json stress-report.jsonThe runner seeds one project, then ramps every selected protocol at the same
time. By default it runs REST, GraphQL, gRPC, and WebSocket workers together.
The -duration flag is the total wall-clock budget for the run. The runner
splits that budget across the concurrency rounds between -start and -max.
Use -mode steady for normal throughput testing. It warms workers before each
measured round, keeps reusable HTTP/gRPC clients alive, and keeps one WebSocket
open per WebSocket worker.
Use -mode connect-storm when you want to measure fresh connection pressure.
In that mode REST and GraphQL disable HTTP keep-alives, gRPC creates a fresh
client connection per call, and WebSocket workers reconnect for every message.
Those numbers answer a different question than steady-state throughput.
Connect storm mode is bounded by -connect-rate 250 by default. The limit is
global across all selected protocols and exists so the client does not burn
through its own ephemeral port range before the server is under useful
pressure. Use -connect-rate 0 for an intentionally unbounded connection
flood, or raise the value gradually when you are looking for the server or OS
limit.
Each round reports:
- Concurrency per protocol
- Total request/message rate
- Per-protocol request/message rate
- Error count and error rate
- Error breakdown by dial, deadline, status, assertion, protocol, and other
- Latency percentiles: p50, p95, p99, and max
- Server-side connection counters when the sample backend exposes them
WebSocket workers keep one socket open per worker and measure message round trips. REST and GraphQL numbers are HTTP requests. gRPC numbers are unary calls.
The server connection line reports accepted, closed, and hijacked connection
deltas plus the final active, idle, non-hijacked open, peak active, and peak
open counts observed through http.Server.ConnState. WebSocket upgrades are
reported as hijacked because Go hands those sockets to the WebSocket driver.
Use protocol selection when you want an isolated ceiling:
go run ./cmd/stress -url http://127.0.0.1:8080 -protocols grpc -duration 10s -start 16 -max 1024go run ./cmd/stress -url http://127.0.0.1:8080 -protocols rest,graphql -duration 10s -start 16 -max 1024go run ./cmd/stress -url http://127.0.0.1:8080 -mode connect-storm -duration 10s -start 8 -max 256 -connect-rate 500Use the mixed run when you want to test Anvil’s same-port dispatcher under realistic pressure.
Live Profiles
Section titled “Live Profiles”Use live profiles when a stress run shows a real limit and you need to know where the time, allocations, or blocking are coming from. The golden sample backend keeps this tooling outside core Anvil. Anvil does not expose pprof or profiling symbols in the SDK.
Start the sample backend with its local pprof server enabled:
PPROF_ADDR=127.0.0.1:6060 go run ./cmd/serverThen run the profiling harness from another shell:
go run ./cmd/profile \ -url http://127.0.0.1:8080 \ -pprof-url http://127.0.0.1:6060 \ -duration 30s \ -start 8 \ -max 512 \ -out profiles/local-runThe harness runs the same mixed-protocol stress workload and captures:
- CPU profile during the primary load run
- Heap profile after the primary load run
- Allocs profile after the primary load run
- Goroutine profile after the primary load run
- Mutex profile after the primary load run
- Block profile after the primary load run
- Runtime trace during a short follow-up load run
- Stress JSON for both the CPU-profiled run and the trace run
Open the artifacts with the standard Go tools:
go tool pprof profiles/local-run/cpu.pprofgo tool pprof -alloc_objects profiles/local-run/allocs.pprofgo tool pprof profiles/local-run/mutex.pprofgo tool pprof profiles/local-run/block.pprofgo tool trace profiles/local-run/trace.traceOn Windows PowerShell, set the pprof address like this:
$env:PPROF_ADDR = "127.0.0.1:6060"go run ./cmd/serverComparing Changes
Section titled “Comparing Changes”Use benchstat when comparing two revisions:
go test ./core/edge -run '^$' -bench BenchmarkEdgeDispatch -benchmem -count 10 -benchtime 3s > old.txt
# Change code, then run again.go test ./core/edge -run '^$' -bench BenchmarkEdgeDispatch -benchmem -count 10 -benchtime 3s > new.txt
benchstat old.txt new.txtOnly claim an improvement when benchstat shows a statistically meaningful
change. Small benchmark movement can come from CPU scheduling, turbo behavior,
background processes, or thermal state.
Profiling
Section titled “Profiling”Use profiles when a benchmark exposes a hot path.
CPU profile:
go test ./internal/smoke \ -run '^$' \ -bench BenchmarkGeneratedBackendDirect/project_get_with_middleware \ -benchtime 10s \ -cpuprofile cpu.prof
go tool pprof cpu.profAllocation profile:
go test ./internal/smoke \ -run '^$' \ -bench BenchmarkGeneratedBackendDirect/project_get_with_middleware \ -benchtime 10s \ -memprofile mem.prof
go tool pprof -alloc_objects mem.profTrace:
go test ./internal/smoke \ -run '^$' \ -bench BenchmarkGeneratedBackendLiveHTTP2Parallel/rest_project_get \ -benchtime 10s \ -trace trace.out
go tool trace trace.outUse profiles to decide what to optimize. Guesswork is how performance work gets expensive without improving the system.
Release Numbers
Section titled “Release Numbers”Produce public performance numbers from a clean run:
- Fixed Go version
- Fixed commit hashes for Anvil, drivers, and the sample backend
- Fresh process with no dev server or browser noise
-count 10benchmark output stored with the release notes- Hardware, operating system, CPU, and memory listed beside the results
benchstatoutput when comparing against a previous release
The documentation can show representative alpha numbers once they come from a repeatable release machine. Until then, local benchmark output is useful for engineering decisions but is not a public guarantee.