Verifying AWS Neuron
NKI Kernels with ESBMC
A reproducible proof-of-concept by Cyber-Reasoning Consultancy (CRC) showing that Neuron Kernel Interface (NKI) kernels, the Python-level kernel API for AWS Trainium and Inferentia NeuronCores, can be statically verified by ESBMC's Python frontend, without ever touching Neuron hardware. Sources, harness and run instructions are published in lucasccordeiro/AWS-Neuron. Along the way the PoC filed two new division-by-zero bugs in upstream NKI kernels that the AWS Neuron team confirmed and fixed.
19
NKI kernels ported and verified by ESBMC across the AWS-Neuron PoC
~26 min
End-to-end wall-clock for the two-phase verification sweep (96 runs)
11 / 13
Symbolic-shape targets carrying k-induction completeness certificates
2
New nki-samples division-by-zero bugs filed by the PoC and fixed upstream by AWS
Context
NKI is the Python-level kernel API that AWS exposes for writing high-performance
tensor kernels on Trainium and Inferentia NeuronCores. NKI kernels look like
ordinary Python but are subject to hardware-specific shape, dtype, and partition
constraints, for example, the partition dimension PMAX is fixed at 128, and the
matmul unit has separate stationary/moving free-dimension limits
(GEMM_STATIONARY_FMAX = 128,
GEMM_MOVING_FMAX = 512).
Violating any of these constraints produces a hardware-level error that is
painful to diagnose at runtime.
The question we wanted to answer is simple: can these constraints be checked formally and exhaustively, before deployment, on a developer laptop and without a NeuronCore? If so, NKI authors get a fast, deterministic safety net that complements on-device testing rather than competing with it.
Approach
We wrote a single ~865-line
stubs.py
library modelling NKI tile shapes, dtypes and hardware constraints, so that
upstream NKI kernels could run unchanged on top of the stubs. Ported kernels are
byte-for-byte faithful to the upstream sources for indexing; no source rewriting
is required. The harness is pinned to the
aws-neuron/nki-samples
snapshot at commit a87aaa44,
covering 11 source files (6 tutorials and 5 community kernels) and 19 kernel
functions.
Verification splits into two phases. Phase 1 checks shapes and
bounds via the contract assertions in stubs.py:
partition-dimension limits, in-bounds slicing, DMA / elementwise / matmul
shape-equality, and matmul-unit hardware bounds. Phase 2 turns on
ESBMC's --overflow-check
(with default division-by-zero detection) to catch CWE-190 (signed integer
overflow) and CWE-369 (division by zero) in the host-side index arithmetic that
surrounds the kernels. Both phases use ESBMC 8.3.0 or later with the default
Bitwuzla SMT solver.
For symbolic-shape targets, where input dimensions are non-deterministic, we used
k-induction to obtain completeness certificates: 11 of 13 symbolic targets
establish that the chosen --unwind
bound is sufficient to cover every reachable state, so the verdict is exhaustive,
not merely bounded. The two exceptions (the Mamba-v3 and attention-forward-v3
symbolic targets) timed out under k-induction and remain heuristic; we report this
openly in the repository.
Results
Two-phase sweep completes in ~26 minutes end-to-end on a stock laptop (Phase 1: 58 runs in ~14 min; Phase 2: 38 runs in ~11 min), with a 100% pass rate against expected verdicts. Concrete-shape targets finish in 1–3 seconds each; symbolic targets in ~5–90 seconds.
Real upstream bug rediscovered in ~2 seconds. The
matmul_hoist_load
kernel that aws-neuron/nki-samples
PR #74
corrected, pre-fix, lhsT_tiles
was allocated with free-dim TILE_N (=512)
where it should have been TILE_M (=128),
is pinned in our harness as a regression target. ESBMC reports the shape
mismatch in ~2 s on a laptop, without a Neuron device, demonstrating that the
technique catches bugs of the class that have shipped in real NKI code.
(This is a retroactive reproduction of an already-fixed upstream issue, not a
new discovery.)
New division-by-zero bug filed and fixed upstream. The Phase-2
CWE-369 check trips a ZeroDivisionError
at JIT-trace time in
interpolate_bilinear_2x_fwd
and interpolate_trilinear_2x_fwd
when chunk_size == 1,
found without relying on any port-time precondition. We reported it as
nki-samples issue #125;
AWS adopted the proposed preconditions
(chunk_size >= 2,
chunked dim >= chunk_size)
and merged the fix in
PR #126.
Second division-by-zero bug filed and fixed upstream. The Phase-2
multi-property sweep also surfaced a
ZeroDivisionError
in tensor_avgpool_kernel
when pool_size == 0.
We reported it as
nki-samples issue #127;
AWS added the proposed
assert pool_size >= 1
guard upstream in commit
bb513ac.
~20 ESBMC Python-frontend issues filed and resolved upstream
over the course of the PoC, every issue paired with its merging PR in the
repository's RETROSPECTIVE.md.
Net effect: the ported kernels are now byte-for-byte faithful to the upstream
NKI source for indexing, and a class of frontend gaps (named-local-binding
workarounds, source-rewrite shims) has been retired in ESBMC itself.
What this means
NKI shape and bounds checking is amenable to bounded model checking with k-induction completeness, at developer-laptop speeds, with no hardware in the loop. The same technique extends to integer-overflow and division-by-zero defects in the host-side glue around kernels. For teams writing or reviewing NKI code, this turns a class of hardware-level failures into compile-time CI gates, and the cost of adoption is a stub library plus a handful of minutes per kernel.
References
- lucasccordeiro/AWS-Neuron, public PoC repository (harness, stubs, ported kernels, two-phase verification sweep, retrospective). github.com/lucasccordeiro/AWS-Neuron
-
aws-neuron/nki-samples, upstream NKI sample kernels (pinned at commit
a87aaa44in the PoC). github.com/aws-neuron/nki-samples - aws-neuron/nki-samples PR #74, matmul SBUF allocation dimension fix (rediscovered retroactively by the PoC). github.com/aws-neuron/nki-samples/pull/74
-
aws-neuron/nki-samples issue #125 → PR #126, division-by-zero in
interpolate_{bi,tri}linear_2x_fwdatchunk_size == 1, filed by the PoC and fixed upstream. github.com/aws-neuron/nki-samples/pull/126 -
aws-neuron/nki-samples issue #127 → commit
bb513ac, division-by-zero intensor_avgpool_kernelatpool_size == 0, filed by the PoC and fixed upstream. github.com/aws-neuron/nki-samples/commit/bb513ac - ESBMC, open-source bounded model checker, with the Python frontend used in this PoC. github.com/esbmc/esbmc
Note. This page describes an independent, reproducible
proof-of-concept published by CRC's founder and ESBMC creator and lead developer,
Prof. Lucas Cordeiro (University of Manchester). CRC Ltd is not in a commercial
partnership with AWS or the AWS Neuron team, and the PoC was not commissioned by
AWS. The two new division-by-zero fixes (nki-samples PR #126 and commit
bb513ac) went through the project's public
issue tracker. The harness is open source; all results, including the retroactive
rediscovery of nki-samples PR #74 and the limits of the symbolic-shape coverage,
are reproducible from the public repository.