Skip to content

Performance

Anvil removes wiring work without becoming the slow part of your backend. Performance testing is split into two layers: engine benchmarks that isolate Anvil itself, and sample-backend benchmarks that exercise generated code under a real application shape.

For current HTTP driver measurements, see Benchmarks.

Use separate numbers for separate questions.

QuestionBenchmark LayerWhy It Matters
How much does Anvil’s edge dispatcher add?core/edgeMeasures protocol classification before a driver sees the request.
How expensive is boot-time wiring?Sample backend wiring benchmarkMeasures DI provider registration and generated component resolution before traffic starts.
How expensive is generated HTTP glue?Sample backend direct benchmarksMeasures middleware, binding, validation, error mapping, and handler calls without network noise.
How does the full stack behave under concurrency?Sample backend live HTTP/2 benchmarksMeasures Anvil edge, driver, generated routes, Go HTTP/2, and client/server scheduling.
What does error handling cost?Error pipeline and sample error benchmarksMeasures expected failures, internal errors, panic recovery, and global error observers.

Keep these numbers separate. A direct handler benchmark and a live HTTP/2 benchmark answer different questions.

Run the core Anvil benchmarks from the anvil repository:

Terminal window
go test ./core/... ./cmd/anvil/... ./testbed \
-run '^$' \
-bench . \
-benchmem \
-count 10 \
-benchtime 3s

Useful focused runs:

Terminal window
go test ./core/edge -run '^$' -bench BenchmarkEdgeDispatch -benchmem -count 10 -benchtime 3s
go test ./core/di -run '^$' -bench BenchmarkInjector -benchmem -count 10 -benchtime 3s
go test ./core/errors -run '^$' -bench BenchmarkPipeline -benchmem -count 10 -benchtime 3s
go test ./core/events -run '^$' -bench BenchmarkBusPublish -benchmem -count 10 -benchtime 3s
go test ./core/bind ./core/validate -run '^$' -bench . -benchmem -count 10 -benchtime 3s

BenchmarkEdgeDispatch includes these cases:

  • Direct handler call
  • HTTP/1 fallback dispatch
  • HTTP/2 REST dispatch
  • GraphQL path dispatch
  • Dispatch by gRPC content type
  • Parallel HTTP/2, GraphQL, and gRPC dispatch

The edge benchmark uses an in-memory response writer and no network I/O. Its job is to expose Anvil’s classification overhead, not total server throughput.

Run the realistic generated-backend benchmarks from the sample backend:

Terminal window
go test ./internal/smoke \
-run '^$' \
-bench BenchmarkGeneratedBackend \
-benchmem \
-count 10 \
-benchtime 3s

The sample backend benchmarks cover:

  • BenchmarkGeneratedBackendWiring: Boot-time generated wiring.
  • BenchmarkGeneratedBackendDirect/health: A small generated HTTP route.
  • BenchmarkGeneratedBackendDirect/project_get_with_middleware: Route params, locals, middleware, DI, and domain lookup.
  • BenchmarkGeneratedBackendDirect/project_create_duplicate_slug_error: JSON decode, generated validation, handler call, domain error mapping, and JSON error response.
  • BenchmarkGeneratedBackendDirect/validation_failure: Generated validation and expected error response mapping.
  • BenchmarkGeneratedBackendDirect/domain_error_with_plugin_mapper: Domain error mapping through a plugin-provided mapper.
  • BenchmarkGeneratedBackendDirect/panic_recovery_error_pipeline: Panic recovery, internal failure mapping, and global error observers.
  • BenchmarkGeneratedBackendLiveHTTP2Parallel: Real HTTP/2 clients against one Anvil listener, including REST, GraphQL, and gRPC.

The live benchmark reports req/s as an extra metric. Treat it as a local machine capacity signal, not as a universal promise.

Benchmarks are good for regressions. They are not enough for release claims. Before publishing alpha performance numbers, run the sample backend as a normal process and use the stress runner from a second process:

Terminal window
go run ./cmd/server
Terminal window
go run ./cmd/stress \
-url http://127.0.0.1:8080 \
-duration 10s \
-mode steady \
-start 8 \
-max 512 \
-stop-error-rate 0.01 \
-json stress-report.json

The runner seeds one project, then ramps every selected protocol at the same time. By default it runs REST, GraphQL, gRPC, and WebSocket workers together. The -duration flag is the total wall-clock budget for the run. The runner splits that budget across the concurrency rounds between -start and -max.

Use -mode steady for normal throughput testing. It warms workers before each measured round, keeps reusable HTTP/gRPC clients alive, and keeps one WebSocket open per WebSocket worker.

Use -mode connect-storm when you want to measure fresh connection pressure. In that mode REST and GraphQL disable HTTP keep-alives, gRPC creates a fresh client connection per call, and WebSocket workers reconnect for every message. Those numbers answer a different question than steady-state throughput.

Connect storm mode is bounded by -connect-rate 250 by default. The limit is global across all selected protocols and exists so the client does not burn through its own ephemeral port range before the server is under useful pressure. Use -connect-rate 0 for an intentionally unbounded connection flood, or raise the value gradually when you are looking for the server or OS limit.

Each round reports:

  • Concurrency per protocol
  • Total request/message rate
  • Per-protocol request/message rate
  • Error count and error rate
  • Error breakdown by dial, deadline, status, assertion, protocol, and other
  • Latency percentiles: p50, p95, p99, and max
  • Server-side connection counters when the sample backend exposes them

WebSocket workers keep one socket open per worker and measure message round trips. REST and GraphQL numbers are HTTP requests. gRPC numbers are unary calls.

The server connection line reports accepted, closed, and hijacked connection deltas plus the final active, idle, non-hijacked open, peak active, and peak open counts observed through http.Server.ConnState. WebSocket upgrades are reported as hijacked because Go hands those sockets to the WebSocket driver.

Use protocol selection when you want an isolated ceiling:

Terminal window
go run ./cmd/stress -url http://127.0.0.1:8080 -protocols grpc -duration 10s -start 16 -max 1024
go run ./cmd/stress -url http://127.0.0.1:8080 -protocols rest,graphql -duration 10s -start 16 -max 1024
go run ./cmd/stress -url http://127.0.0.1:8080 -mode connect-storm -duration 10s -start 8 -max 256 -connect-rate 500

Use the mixed run when you want to test Anvil’s same-port dispatcher under realistic pressure.

Use live profiles when a stress run shows a real limit and you need to know where the time, allocations, or blocking are coming from. The golden sample backend keeps this tooling outside core Anvil. Anvil does not expose pprof or profiling symbols in the SDK.

Start the sample backend with its local pprof server enabled:

Terminal window
PPROF_ADDR=127.0.0.1:6060 go run ./cmd/server

Then run the profiling harness from another shell:

Terminal window
go run ./cmd/profile \
-url http://127.0.0.1:8080 \
-pprof-url http://127.0.0.1:6060 \
-duration 30s \
-start 8 \
-max 512 \
-out profiles/local-run

The harness runs the same mixed-protocol stress workload and captures:

  • CPU profile during the primary load run
  • Heap profile after the primary load run
  • Allocs profile after the primary load run
  • Goroutine profile after the primary load run
  • Mutex profile after the primary load run
  • Block profile after the primary load run
  • Runtime trace during a short follow-up load run
  • Stress JSON for both the CPU-profiled run and the trace run

Open the artifacts with the standard Go tools:

Terminal window
go tool pprof profiles/local-run/cpu.pprof
go tool pprof -alloc_objects profiles/local-run/allocs.pprof
go tool pprof profiles/local-run/mutex.pprof
go tool pprof profiles/local-run/block.pprof
go tool trace profiles/local-run/trace.trace

On Windows PowerShell, set the pprof address like this:

Terminal window
$env:PPROF_ADDR = "127.0.0.1:6060"
go run ./cmd/server

Use benchstat when comparing two revisions:

Terminal window
go test ./core/edge -run '^$' -bench BenchmarkEdgeDispatch -benchmem -count 10 -benchtime 3s > old.txt
# Change code, then run again.
go test ./core/edge -run '^$' -bench BenchmarkEdgeDispatch -benchmem -count 10 -benchtime 3s > new.txt
benchstat old.txt new.txt

Only claim an improvement when benchstat shows a statistically meaningful change. Small benchmark movement can come from CPU scheduling, turbo behavior, background processes, or thermal state.

Use profiles when a benchmark exposes a hot path.

CPU profile:

Terminal window
go test ./internal/smoke \
-run '^$' \
-bench BenchmarkGeneratedBackendDirect/project_get_with_middleware \
-benchtime 10s \
-cpuprofile cpu.prof
go tool pprof cpu.prof

Allocation profile:

Terminal window
go test ./internal/smoke \
-run '^$' \
-bench BenchmarkGeneratedBackendDirect/project_get_with_middleware \
-benchtime 10s \
-memprofile mem.prof
go tool pprof -alloc_objects mem.prof

Trace:

Terminal window
go test ./internal/smoke \
-run '^$' \
-bench BenchmarkGeneratedBackendLiveHTTP2Parallel/rest_project_get \
-benchtime 10s \
-trace trace.out
go tool trace trace.out

Use profiles to decide what to optimize. Guesswork is how performance work gets expensive without improving the system.

Produce public performance numbers from a clean run:

  • Fixed Go version
  • Fixed commit hashes for Anvil, drivers, and the sample backend
  • Fresh process with no dev server or browser noise
  • -count 10 benchmark output stored with the release notes
  • Hardware, operating system, CPU, and memory listed beside the results
  • benchstat output when comparing against a previous release

The documentation can show representative alpha numbers once they come from a repeatable release machine. Until then, local benchmark output is useful for engineering decisions but is not a public guarantee.