AI accelerators · Python kernels · 2026

Verifying AWS Neuron
NKI Kernels with ESBMC

A reproducible proof-of-concept by Cyber-Reasoning Consultancy (CRC) showing that Neuron Kernel Interface (NKI) kernels — the Python-level kernel API for AWS Trainium and Inferentia NeuronCores — can be statically verified by ESBMC's Python frontend, without ever touching Neuron hardware. Sources, harness and run instructions are published in lucasccordeiro/AWS-Neuron.

NKI kernels ported and verified by ESBMC across the AWS-Neuron PoC

~26 min

End-to-end wall-clock for the two-phase verification sweep (96 runs)

11 / 13

Symbolic-shape targets carrying k-induction completeness certificates

~2 s

Time for ESBMC to flag the nki-samples PR #74 matmul shape bug on a laptop

Context

NKI is the Python-level kernel API that AWS exposes for writing high-performance tensor kernels on Trainium and Inferentia NeuronCores. NKI kernels look like ordinary Python but are subject to hardware-specific shape, dtype, and partition constraints — for example, the partition dimension PMAX is fixed at 128, and the matmul unit has separate stationary/moving free-dimension limits (GEMM_STATIONARY_FMAX = 128, GEMM_MOVING_FMAX = 512). Violating any of these constraints produces a hardware-level error that is painful to diagnose at runtime.

The question we wanted to answer is simple: can these constraints be checked formally and exhaustively, before deployment, on a developer laptop and without a NeuronCore? If so, NKI authors get a fast, deterministic safety net that complements on-device testing rather than competing with it.

Approach

We wrote a single ~865-line stubs.py library modelling NKI tile shapes, dtypes and hardware constraints, so that upstream NKI kernels could run unchanged on top of the stubs. Ported kernels are byte-for-byte faithful to the upstream sources for indexing — no source rewriting is required. The harness is pinned to the aws-neuron/nki-samples snapshot at commit a87aaa44, covering 11 source files (6 tutorials and 5 community kernels) and 19 kernel functions.

Verification splits into two phases. Phase 1 checks shapes and bounds via the contract assertions in stubs.py: partition-dimension limits, in-bounds slicing, DMA / elementwise / matmul shape-equality, and matmul-unit hardware bounds. Phase 2 turns on ESBMC's --overflow-check (with default division-by-zero detection) to catch CWE-190 (signed integer overflow) and CWE-369 (division by zero) in the host-side index arithmetic that surrounds the kernels. Both phases use ESBMC 8.3.0 or later with the default Bitwuzla SMT solver.

For symbolic-shape targets, where input dimensions are non-deterministic, we used k-induction to obtain completeness certificates: 11 of 13 symbolic targets establish that the chosen --unwind bound is sufficient to cover every reachable state, so the verdict is exhaustive, not merely bounded. The two exceptions (the Mamba-v3 and attention-forward-v3 symbolic targets) timed out under k-induction and remain heuristic; we report this openly in the repository.

Results

Two-phase sweep completes in ~26 minutes end-to-end on a stock laptop (Phase 1: 58 runs in ~14 min; Phase 2: 38 runs in ~11 min), with a 100% pass rate against expected verdicts. Concrete-shape targets finish in 1–3 seconds each; symbolic targets in ~5–90 seconds.

Real upstream bug rediscovered in ~2 seconds. The matmul_hoist_load kernel that aws-neuron/nki-samples PR #74 corrected — pre-fix, lhsT_tiles was allocated with free-dim TILE_N (=512) where it should have been TILE_M (=128) — is pinned in our harness as a regression target. ESBMC reports the shape mismatch in ~2 s on a laptop, without a Neuron device, demonstrating that the technique catches bugs of the class that have shipped in real NKI code. (This is a retroactive reproduction of an already-fixed upstream issue, not a new discovery.)

Independent rediscovery of an input-validation gap. The Phase-2 CWE-369 check trips a ZeroDivisionError in interpolate_bilinear_fwd and interpolate_trilinear_fwd at chunk_size = 1, found without relying on any port-time precondition.

~20 ESBMC Python-frontend issues filed and resolved upstream over the course of the PoC — every issue paired with its merging PR in the repository's RETROSPECTIVE.md. Net effect: the ported kernels are now byte-for-byte faithful to the upstream NKI source for indexing, and a class of frontend gaps (named-local-binding workarounds, source-rewrite shims) has been retired in ESBMC itself.

What this means

NKI shape and bounds checking is amenable to bounded model checking with k-induction completeness, at developer-laptop speeds, with no hardware in the loop. The same technique extends to integer-overflow and division-by-zero defects in the host-side glue around kernels. For teams writing or reviewing NKI code, this turns a class of hardware-level failures into compile-time CI gates — and the cost of adoption is a stub library plus a handful of minutes per kernel.

References

lucasccordeiro/AWS-Neuron — public PoC repository (harness, stubs, ported kernels, two-phase verification sweep, retrospective). github.com/lucasccordeiro/AWS-Neuron
aws-neuron/nki-samples — upstream NKI sample kernels (pinned at commit a87aaa44 in the PoC). github.com/aws-neuron/nki-samples
aws-neuron/nki-samples PR #74 — matmul SBUF allocation dimension fix (rediscovered retroactively by the PoC). github.com/aws-neuron/nki-samples/pull/74
ESBMC — open-source bounded model checker, with the Python frontend used in this PoC. github.com/esbmc/esbmc

Note. This page describes an independent, reproducible proof-of-concept published by CRC's founder and ESBMC creator and lead developer, Prof. Lucas Cordeiro (University of Manchester). CRC Ltd is not in a commercial partnership with AWS or the AWS Neuron team, and the PoC was not commissioned by AWS. The harness is open source; all results, including the retroactive rediscovery of nki-samples PR #74 and the limits of the symbolic-shape coverage, are reproducible from the public repository.