Verifying AWS Neuron
NKI Kernels with ESBMC
A reproducible proof-of-concept by Cyber-Reasoning Consultancy (CRC) showing that Neuron Kernel Interface (NKI) kernels — the Python-level kernel API for AWS Trainium and Inferentia NeuronCores — can be statically verified by ESBMC's Python frontend, without ever touching Neuron hardware. Sources, harness and run instructions are published in lucasccordeiro/AWS-Neuron.
19
NKI kernels ported and verified by ESBMC across the AWS-Neuron PoC
~26 min
End-to-end wall-clock for the two-phase verification sweep (96 runs)
11 / 13
Symbolic-shape targets carrying k-induction completeness certificates
~2 s
Time for ESBMC to flag the nki-samples PR #74 matmul shape bug on a laptop
Context
NKI is the Python-level kernel API that AWS exposes for writing high-performance
tensor kernels on Trainium and Inferentia NeuronCores. NKI kernels look like
ordinary Python but are subject to hardware-specific shape, dtype, and partition
constraints — for example, the partition dimension PMAX is fixed at 128, and the
matmul unit has separate stationary/moving free-dimension limits
(GEMM_STATIONARY_FMAX = 128,
GEMM_MOVING_FMAX = 512).
Violating any of these constraints produces a hardware-level error that is
painful to diagnose at runtime.
The question we wanted to answer is simple: can these constraints be checked formally and exhaustively, before deployment, on a developer laptop and without a NeuronCore? If so, NKI authors get a fast, deterministic safety net that complements on-device testing rather than competing with it.
Approach
We wrote a single ~865-line
stubs.py
library modelling NKI tile shapes, dtypes and hardware constraints, so that
upstream NKI kernels could run unchanged on top of the stubs. Ported kernels are
byte-for-byte faithful to the upstream sources for indexing — no source rewriting
is required. The harness is pinned to the
aws-neuron/nki-samples
snapshot at commit a87aaa44,
covering 11 source files (6 tutorials and 5 community kernels) and 19 kernel
functions.
Verification splits into two phases. Phase 1 checks shapes and
bounds via the contract assertions in stubs.py:
partition-dimension limits, in-bounds slicing, DMA / elementwise / matmul
shape-equality, and matmul-unit hardware bounds. Phase 2 turns on
ESBMC's --overflow-check
(with default division-by-zero detection) to catch CWE-190 (signed integer
overflow) and CWE-369 (division by zero) in the host-side index arithmetic that
surrounds the kernels. Both phases use ESBMC 8.3.0 or later with the default
Bitwuzla SMT solver.
For symbolic-shape targets, where input dimensions are non-deterministic, we used
k-induction to obtain completeness certificates: 11 of 13 symbolic targets
establish that the chosen --unwind
bound is sufficient to cover every reachable state, so the verdict is exhaustive,
not merely bounded. The two exceptions (the Mamba-v3 and attention-forward-v3
symbolic targets) timed out under k-induction and remain heuristic; we report this
openly in the repository.
Results
Two-phase sweep completes in ~26 minutes end-to-end on a stock laptop (Phase 1: 58 runs in ~14 min; Phase 2: 38 runs in ~11 min), with a 100% pass rate against expected verdicts. Concrete-shape targets finish in 1–3 seconds each; symbolic targets in ~5–90 seconds.
Real upstream bug rediscovered in ~2 seconds. The
matmul_hoist_load
kernel that aws-neuron/nki-samples
PR #74
corrected — pre-fix, lhsT_tiles
was allocated with free-dim TILE_N (=512)
where it should have been TILE_M (=128) —
is pinned in our harness as a regression target. ESBMC reports the shape
mismatch in ~2 s on a laptop, without a Neuron device, demonstrating that the
technique catches bugs of the class that have shipped in real NKI code.
(This is a retroactive reproduction of an already-fixed upstream issue, not a
new discovery.)
Independent rediscovery of an input-validation gap. The Phase-2
CWE-369 check trips a ZeroDivisionError
in interpolate_bilinear_fwd
and interpolate_trilinear_fwd
at chunk_size = 1,
found without relying on any port-time precondition.
~20 ESBMC Python-frontend issues filed and resolved upstream
over the course of the PoC — every issue paired with its merging PR in the
repository's RETROSPECTIVE.md.
Net effect: the ported kernels are now byte-for-byte faithful to the upstream
NKI source for indexing, and a class of frontend gaps (named-local-binding
workarounds, source-rewrite shims) has been retired in ESBMC itself.
What this means
NKI shape and bounds checking is amenable to bounded model checking with k-induction completeness, at developer-laptop speeds, with no hardware in the loop. The same technique extends to integer-overflow and division-by-zero defects in the host-side glue around kernels. For teams writing or reviewing NKI code, this turns a class of hardware-level failures into compile-time CI gates — and the cost of adoption is a stub library plus a handful of minutes per kernel.
References
- lucasccordeiro/AWS-Neuron — public PoC repository (harness, stubs, ported kernels, two-phase verification sweep, retrospective). github.com/lucasccordeiro/AWS-Neuron
-
aws-neuron/nki-samples — upstream NKI sample kernels (pinned at commit
a87aaa44in the PoC). github.com/aws-neuron/nki-samples - aws-neuron/nki-samples PR #74 — matmul SBUF allocation dimension fix (rediscovered retroactively by the PoC). github.com/aws-neuron/nki-samples/pull/74
- ESBMC — open-source bounded model checker, with the Python frontend used in this PoC. github.com/esbmc/esbmc
Note. This page describes an independent, reproducible proof-of-concept published by CRC's founder and ESBMC creator and lead developer, Prof. Lucas Cordeiro (University of Manchester). CRC Ltd is not in a commercial partnership with AWS or the AWS Neuron team, and the PoC was not commissioned by AWS. The harness is open source; all results, including the retroactive rediscovery of nki-samples PR #74 and the limits of the symbolic-shape coverage, are reproducible from the public repository.