ONNX Runtime

Name: ONNX Runtime
Brand: ONNX Runtime

AI-ASSISTANTS

Velocity3.8

Cross-platform inference and training engine for ONNX-format machine-learning models.

onnxruntime.ai ↗

ONNX Runtime is unbundling its execution providers into independently shippable plugins.

execution-providersplugin-architecturellm-inferencequantizationsecurity-hardeningwebgpu

◆Current state

ONNX Runtime is mid-transition to a plugin-based execution-provider architecture: EPs that were once compiled into the core binary now ship as separately versioned libraries that register at runtime. Recent releases pair heavy LLM-oriented kernel work (attention, quantized MatMul/MoE, KV-cache) with deep security hardening across operators.

◆Where it's heading

The directional move is decoupling: the CUDA Plugin EP landed in 1.25, and the WebGPU EP has now shipped as a standalone plugin against any compatible ORT install. This lets EPs iterate on their own cadence and lets third parties deliver hardware backends without rebuilding ORT, while the core focuses on LLM inference primitives and breaking platform-baseline raises (C++20, CUDA 12->13).

◆Prediction

Expect more first-party EPs (TensorRT, QNN, CoreML) to migrate to the plugin model and a published, stable plugin-EP API surface as the default integration path.

◆Recent moves

15d ago
ONNX Runtime v1.27.0
v1.27.0 deepens the plugin EP API (zero-copy I/O, CUDA plugin options), adds LLM kernel work across CUDA/WebGPU/CPU, new datatypes (FLOAT8E8M0), and an extensive batch of security and correctness fixes.
View source ↗
1mo ago
ONNX Runtime WebGPU Plugin EP v0.1.0
⚡ SPARK
First standalone release of the WebGPU EP as a plugin, distributed and versioned independently of the core onnxruntime binary and registered at runtime against ORT 1.24.4+.
View source ↗
2mo ago
ONNX Runtime v1.25.1
Patch release adding newer opset versions and LinearAttention/CausalConvState/RotaryEmbedding/RMSNorm operators for Qwen3.5 support, plus WebGPU decode-path optimizations.
View source ↗
2mo ago
ONNX Runtime v1.25.0
⚡ SPARK
v1.25.0 introduces the CUDA Plugin EP, the first core implementation letting third-party CUDA-backed EPs load as dynamic plugins without rebuilding ORT, alongside breaking baseline raises (C++20, CUDA 12 minimum) and broad LLM op coverage.
View source ↗
3mo ago
ONNX Runtime v1.24.4
Patch release for the 1.24 line with core/plugin-EP null-pointer and MetaDef ID fixes, QNN EP tweaks, and a Python 3.10-drop version bump.
View source ↗
4mo ago
ONNX Runtime v1.24.3
Patch release for 1.24 with several security/OOB fixes, a 4x QMoE CPU speedup on 4-bit, and assorted EP and build fixes.
View source ↗

ONNX Runtime is unbundling its execution providers into independently shippable plugins.

◆Recent moves

ONNX Runtime v1.27.0

ONNX Runtime WebGPU Plugin EP v0.1.0

ONNX Runtime v1.25.1

ONNX Runtime v1.25.0

ONNX Runtime v1.24.4

ONNX Runtime v1.24.3