Skip to main content
All case studies
AI accelerators · Python kernels · 2026

Verifying AWS Neuron
NKI Kernels with ESBMC

A reproducible proof-of-concept by Cyber-Reasoning Consultancy (CRC) showing that Neuron Kernel Interface (NKI) kernels, the Python-level kernel API for AWS Trainium and Inferentia NeuronCores, can be statically verified by ESBMC's Python frontend, without ever touching Neuron hardware. Sources, harness and run instructions are published in lucasccordeiro/AWS-Neuron. Along the way the PoC filed two new division-by-zero bugs in upstream NKI kernels that the AWS Neuron team confirmed and fixed.

19

NKI kernels ported and verified by ESBMC across the AWS-Neuron PoC

~26 min

End-to-end wall-clock for the two-phase verification sweep (96 runs)

11 / 13

Symbolic-shape targets carrying k-induction completeness certificates

2

New nki-samples division-by-zero bugs filed by the PoC and fixed upstream by AWS

Context

NKI is the Python-level kernel API that AWS exposes for writing high-performance tensor kernels on Trainium and Inferentia NeuronCores. NKI kernels look like ordinary Python but are subject to hardware-specific shape, dtype, and partition constraints, for example, the partition dimension PMAX is fixed at 128, and the matmul unit has separate stationary/moving free-dimension limits (GEMM_STATIONARY_FMAX = 128, GEMM_MOVING_FMAX = 512). Violating any of these constraints produces a hardware-level error that is painful to diagnose at runtime.

The question we wanted to answer is simple: can these constraints be checked formally and exhaustively, before deployment, on a developer laptop and without a NeuronCore? If so, NKI authors get a fast, deterministic safety net that complements on-device testing rather than competing with it.

Approach

We wrote a single ~865-line stubs.py library modelling NKI tile shapes, dtypes and hardware constraints, so that upstream NKI kernels could run unchanged on top of the stubs. Ported kernels are byte-for-byte faithful to the upstream sources for indexing; no source rewriting is required. The harness is pinned to the aws-neuron/nki-samples snapshot at commit a87aaa44, covering 11 source files (6 tutorials and 5 community kernels) and 19 kernel functions.

Verification splits into two phases. Phase 1 checks shapes and bounds via the contract assertions in stubs.py: partition-dimension limits, in-bounds slicing, DMA / elementwise / matmul shape-equality, and matmul-unit hardware bounds. Phase 2 turns on ESBMC's --overflow-check (with default division-by-zero detection) to catch CWE-190 (signed integer overflow) and CWE-369 (division by zero) in the host-side index arithmetic that surrounds the kernels. Both phases use ESBMC 8.3.0 or later with the default Bitwuzla SMT solver.

For symbolic-shape targets, where input dimensions are non-deterministic, we used k-induction to obtain completeness certificates: 11 of 13 symbolic targets establish that the chosen --unwind bound is sufficient to cover every reachable state, so the verdict is exhaustive, not merely bounded. The two exceptions (the Mamba-v3 and attention-forward-v3 symbolic targets) timed out under k-induction and remain heuristic; we report this openly in the repository.

Results

Two-phase sweep completes in ~26 minutes end-to-end on a stock laptop (Phase 1: 58 runs in ~14 min; Phase 2: 38 runs in ~11 min), with a 100% pass rate against expected verdicts. Concrete-shape targets finish in 1–3 seconds each; symbolic targets in ~5–90 seconds.

Real upstream bug rediscovered in ~2 seconds. The matmul_hoist_load kernel that aws-neuron/nki-samples PR #74 corrected, pre-fix, lhsT_tiles was allocated with free-dim TILE_N (=512) where it should have been TILE_M (=128), is pinned in our harness as a regression target. ESBMC reports the shape mismatch in ~2 s on a laptop, without a Neuron device, demonstrating that the technique catches bugs of the class that have shipped in real NKI code. (This is a retroactive reproduction of an already-fixed upstream issue, not a new discovery.)

New division-by-zero bug filed and fixed upstream. The Phase-2 CWE-369 check trips a ZeroDivisionError at JIT-trace time in interpolate_bilinear_2x_fwd and interpolate_trilinear_2x_fwd when chunk_size == 1, found without relying on any port-time precondition. We reported it as nki-samples issue #125; AWS adopted the proposed preconditions (chunk_size >= 2, chunked dim >= chunk_size) and merged the fix in PR #126.

Second division-by-zero bug filed and fixed upstream. The Phase-2 multi-property sweep also surfaced a ZeroDivisionError in tensor_avgpool_kernel when pool_size == 0. We reported it as nki-samples issue #127; AWS added the proposed assert pool_size >= 1 guard upstream in commit bb513ac.

~20 ESBMC Python-frontend issues filed and resolved upstream over the course of the PoC, every issue paired with its merging PR in the repository's RETROSPECTIVE.md. Net effect: the ported kernels are now byte-for-byte faithful to the upstream NKI source for indexing, and a class of frontend gaps (named-local-binding workarounds, source-rewrite shims) has been retired in ESBMC itself.

What this means

NKI shape and bounds checking is amenable to bounded model checking with k-induction completeness, at developer-laptop speeds, with no hardware in the loop. The same technique extends to integer-overflow and division-by-zero defects in the host-side glue around kernels. For teams writing or reviewing NKI code, this turns a class of hardware-level failures into compile-time CI gates, and the cost of adoption is a stub library plus a handful of minutes per kernel.

References

  1. lucasccordeiro/AWS-Neuron, public PoC repository (harness, stubs, ported kernels, two-phase verification sweep, retrospective). github.com/lucasccordeiro/AWS-Neuron
  2. aws-neuron/nki-samples, upstream NKI sample kernels (pinned at commit a87aaa44 in the PoC). github.com/aws-neuron/nki-samples
  3. aws-neuron/nki-samples PR #74, matmul SBUF allocation dimension fix (rediscovered retroactively by the PoC). github.com/aws-neuron/nki-samples/pull/74
  4. aws-neuron/nki-samples issue #125 → PR #126, division-by-zero in interpolate_{bi,tri}linear_2x_fwd at chunk_size == 1, filed by the PoC and fixed upstream. github.com/aws-neuron/nki-samples/pull/126
  5. aws-neuron/nki-samples issue #127 → commit bb513ac, division-by-zero in tensor_avgpool_kernel at pool_size == 0, filed by the PoC and fixed upstream. github.com/aws-neuron/nki-samples/commit/bb513ac
  6. ESBMC, open-source bounded model checker, with the Python frontend used in this PoC. github.com/esbmc/esbmc

Note. This page describes an independent, reproducible proof-of-concept published by CRC's founder and ESBMC creator and lead developer, Prof. Lucas Cordeiro (University of Manchester). CRC Ltd is not in a commercial partnership with AWS or the AWS Neuron team, and the PoC was not commissioned by AWS. The two new division-by-zero fixes (nki-samples PR #126 and commit bb513ac) went through the project's public issue tracker. The harness is open source; all results, including the retroactive rediscovery of nki-samples PR #74 and the limits of the symbolic-shape coverage, are reproducible from the public repository.