ONNX Runtime vs Together AI
Side-by-side trajectory, velocity, and editorial themes.
ONNX Runtime is doing the unglamorous work: C++20, CUDA 12, free-threaded Python, EP plugin API.
ONNX Runtime is mid-platform-modernization. v1.25.0 raised the build floor to C++20 and CUDA 12.0, removed the ArmNN execution provider, and bumped ONNX to 1.21. v1.24.1 made the parallel move on the Python side — dropped 3.10, added 3.14 and free-threaded (PEP 703) variants, and introduced the EP Plugin API for dynamically loaded execution providers. Between those structural releases, the 1.24.x patch line has been heavily security-focused: multiple heap out-of-bounds fixes (GatherCopyData, RoiAlign, Lora Adapters, ArrayFeatureExtractor). New model and operator support continues — Qwen3.5 across LinearAttention/CausalConvState/RMSNorm/RotEMB, including WebGPU.
The runtime is repositioning for the next wave: free-threaded Python lets ML workloads finally escape the GIL on CPU paths, the EP Plugin API decouples hardware-vendor execution providers from the runtime release cycle, and the WebGPU EP keeps adding frontier-model coverage. The cost is sharp deprecation — C++20, CUDA 12, no more Python 3.10, no more x86_64 macOS — but this is the pattern of a project clearing technical debt to support the next two years of GPU-vendor diversity and edge inference.
Expect more vendor execution providers (Qualcomm QNN, Apple Neural Engine, Intel) to migrate onto the new Plugin EP API in the next two releases, and continued security-patch cadence on 1.24.x for users who can't move to 1.25 yet. WebGPU EP coverage will keep tracking new model architectures — Qwen 3.5 today, the next frontier MoE class tomorrow.
Together AI is pricing itself as the open-stack alternative to frontier coding-agent APIs.
Together is hammering on two things: (a) inference economics, with a benchmark claiming 76% lower cost than Claude Opus 4.6 on coding-agent workloads, and (b) breadth of model surface, evidenced by day-0 Nemotron 3 Nano Omni, DeepSeek-V4 Pro at 512K context, and Goose-driven 'deploy any HuggingFace model' tooling. Side outputs — a voice finder, the Violin video-translation tool, and a Pearl Research Labs crypto-inference partnership — broaden the developer surface without changing the core narrative.
Together is positioning to be the default API for teams running coding agents on open models, with explicit price/perf comparisons against closed labs. The pattern of day-0 launches plus dedicated container offerings makes the strategy clear: any open frontier model should be one click away on Together. Crypto-adjacent and partnership work (Pearl, Adaption) reads as experimentation rather than core roadmap.
Expect more cost-comparison content against named frontier APIs and a tighter coding-agent SKU (likely a benchmark-grounded preset for Cursor/Aider-style workloads). Day-0 launch cadence will continue as the differentiator versus AWS Bedrock and other neoclouds.
See more alternatives to ONNX Runtime →
See more alternatives to Together AI →