ONNX Runtime
Cross-platform inference and training engine for ONNX-format machine-learning models.
ONNX Runtime is unbundling its execution providers into independently shippable plugins.
◆Recent moves
- 15d ago
ONNX Runtime v1.27.0
v1.27.0 deepens the plugin EP API (zero-copy I/O, CUDA plugin options), adds LLM kernel work across CUDA/WebGPU/CPU, new datatypes (FLOAT8E8M0), and an extensive batch of security and correctness fixes.
View source ↗ - 1mo ago
ONNX Runtime WebGPU Plugin EP v0.1.0
⚡ SPARKFirst standalone release of the WebGPU EP as a plugin, distributed and versioned independently of the core onnxruntime binary and registered at runtime against ORT 1.24.4+.
View source ↗ - 2mo ago
ONNX Runtime v1.25.1
Patch release adding newer opset versions and LinearAttention/CausalConvState/RotaryEmbedding/RMSNorm operators for Qwen3.5 support, plus WebGPU decode-path optimizations.
View source ↗ - 2mo ago
ONNX Runtime v1.25.0
⚡ SPARKv1.25.0 introduces the CUDA Plugin EP, the first core implementation letting third-party CUDA-backed EPs load as dynamic plugins without rebuilding ORT, alongside breaking baseline raises (C++20, CUDA 12 minimum) and broad LLM op coverage.
View source ↗ - 3mo ago
ONNX Runtime v1.24.4
Patch release for the 1.24 line with core/plugin-EP null-pointer and MetaDef ID fixes, QNN EP tweaks, and a Python 3.10-drop version bump.
View source ↗ - 4mo ago
ONNX Runtime v1.24.3
Patch release for 1.24 with several security/OOB fixes, a 4x QMoE CPU speedup on 4-bit, and assorted EP and build fixes.
View source ↗