← Back to all sparks
O

ONNX Runtime

AI-ASSISTANTS
Velocity2.0

Cross-platform inference and training engine for ONNX-format machine-learning models.

ONNX Runtime is doing the unglamorous work: C++20, CUDA 12, free-threaded Python, EP plugin API.

platform-modernizationep-plugin-apisecurity-patcheswebgpufree-threaded-pythonmodel-coverage
Current state
ONNX Runtime is mid-platform-modernization. v1.25.0 raised the build floor to C++20 and CUDA 12.0, removed the ArmNN execution provider, and bumped ONNX to 1.21. v1.24.1 made the parallel move on the Python side — dropped 3.10, added 3.14 and free-threaded (PEP 703) variants, and introduced the EP Plugin API for dynamically loaded execution providers. Between those structural releases, the 1.24.x patch line has been heavily security-focused: multiple heap out-of-bounds fixes (GatherCopyData, RoiAlign, Lora Adapters, ArrayFeatureExtractor). New model and operator support continues — Qwen3.5 across LinearAttention/CausalConvState/RMSNorm/RotEMB, including WebGPU.
Where it's heading
The runtime is repositioning for the next wave: free-threaded Python lets ML workloads finally escape the GIL on CPU paths, the EP Plugin API decouples hardware-vendor execution providers from the runtime release cycle, and the WebGPU EP keeps adding frontier-model coverage. The cost is sharp deprecation — C++20, CUDA 12, no more Python 3.10, no more x86_64 macOS — but this is the pattern of a project clearing technical debt to support the next two years of GPU-vendor diversity and edge inference.
Prediction
Expect more vendor execution providers (Qualcomm QNN, Apple Neural Engine, Intel) to migrate onto the new Plugin EP API in the next two releases, and continued security-patch cadence on 1.24.x for users who can't move to 1.25 yet. WebGPU EP coverage will keep tracking new model architectures — Qwen 3.5 today, the next frontier MoE class tomorrow.

Recent moves

  1. 22d ago

    ONNX Runtime v1.25.1

    v1.25.1 adds operators required for Qwen3.5 support (LinearAttention, CausalConvState, RotaryEmbedding, RMSNorm), enables Qwen3.5 on the WebGPU execution provider, and bumps Reshape and Transpose to newer opset versions. Routine new-model integration on top of the 1.25.0 platform reset.

    View source ↗
  2. 29d ago

    ONNX Runtime v1.25.0

    ⚡ SPARK

    v1.25.0 raises the floor: C++20 required to build from source, CUDA minimum bumped to 12.0 (11.x dropped), ArmNN EP removed, ONNX upgraded to 1.21.0. The biggest platform-compatibility break in recent releases.

    View source ↗
  3. 2mo ago

    ONNX Runtime v1.24.4

    v1.24.4 patch: PCI-bus GPU fallback for containerized Linux environments where nvidia-drm isn't loaded, plus Plugin EP null-deref and MetaDef-ID conflict fixes. Routine bug-fix maintenance on the 1.24 line.

    View source ↗
  4. 2mo ago

    ONNX Runtime v1.24.3

    v1.24.3 ships a batch of security fixes — heap out-of-bounds reads/writes in GatherCopyData, RoiAlign, Lora Adapter loading, and Resize — plus GatherND division-by-zero and external-data-path validation hardening. Maintenance-line release, but with real CVE-class surface.

    View source ↗
  5. 3mo ago

    ONNX Runtime v1.24.2

    v1.24.2 patches NuGet native-library loading on Linux/macOS, Java/Jar testing on macOS ARM64, an ArrayFeatureExtractor OOB read, and adds boundary checks for SparseTensorProto conversion. Cross-platform stability work.

    View source ↗
  6. 3mo ago

    ONNX Runtime v1.24.1

    ⚡ SPARK

    v1.24.1 drops Python 3.10 wheels, adds Python 3.14 plus free-threaded (PEP 703) builds for 3.13t/3.14t on Linux, drops x86_64 macOS/iOS binaries (min macOS 14.0), and ships the Execution Provider Plugin API with kernel-based EPs, weight pre-packing, and EP Context model support. Major infrastructure shift on the Python side.

    View source ↗