ONNX Runtime vs Lambda Labs
Side-by-side trajectory, velocity, and editorial themes.
ONNX Runtime is doing the unglamorous work: C++20, CUDA 12, free-threaded Python, EP plugin API.
ONNX Runtime is mid-platform-modernization. v1.25.0 raised the build floor to C++20 and CUDA 12.0, removed the ArmNN execution provider, and bumped ONNX to 1.21. v1.24.1 made the parallel move on the Python side — dropped 3.10, added 3.14 and free-threaded (PEP 703) variants, and introduced the EP Plugin API for dynamically loaded execution providers. Between those structural releases, the 1.24.x patch line has been heavily security-focused: multiple heap out-of-bounds fixes (GatherCopyData, RoiAlign, Lora Adapters, ArrayFeatureExtractor). New model and operator support continues — Qwen3.5 across LinearAttention/CausalConvState/RMSNorm/RotEMB, including WebGPU.
The runtime is repositioning for the next wave: free-threaded Python lets ML workloads finally escape the GIL on CPU paths, the EP Plugin API decouples hardware-vendor execution providers from the runtime release cycle, and the WebGPU EP keeps adding frontier-model coverage. The cost is sharp deprecation — C++20, CUDA 12, no more Python 3.10, no more x86_64 macOS — but this is the pattern of a project clearing technical debt to support the next two years of GPU-vendor diversity and edge inference.
Expect more vendor execution providers (Qualcomm QNN, Apple Neural Engine, Intel) to migrate onto the new Plugin EP API in the next two releases, and continued security-patch cadence on 1.24.x for users who can't move to 1.25 yet. WebGPU EP coverage will keep tracking new model architectures — Qwen 3.5 today, the next frontier MoE class tomorrow.
Lambda is restructuring as a gigawatt-scale telco-style infrastructure operator, not an AI startup.
Lambda is simultaneously upgrading its capital structure ($1B senior secured credit facility, on top of August 2025), its leadership (telco veteran Michel Combes as CEO, former AT&T CEO as Chairman, co-founder Balaban to CTO), and its technical credibility (audited STAC-AI LANG6 result on NVIDIA HGX 8xB200, MLPerf Inference v6.0 results). The published content alternates between deep technical work (FlashAttention-4 on Blackwell, ICLR papers, distilled tool-calling datasets) and infrastructure-positioning pieces — "compute is not a commodity" reads as a direct pitch against hyperscaler abstraction.
The arc is unambiguous: Lambda is becoming a vertically-integrated AI infrastructure operator at gigawatt scale, positioned to absorb large training-cluster demand that's currently flowing to CoreWeave, Crusoe, and the hyperscalers. Bringing in a CEO who ran SFR, Vodafone, and AT&T network ops, plus an AT&T chairman, signals the company is preparing to operate like a power and network utility, not a startup. Research output (papers, tool-calling datasets, kernel optimizations) ladders into the same story by establishing technical depth.
Expect specific gigawatt-scale site announcements (likely sourced from the new credit facility) within the next quarter, and at least one major training-cluster customer announcement to validate the capital structure. Continued benchmark publishing in regulated verticals (after FSI/STAC-AI, likely healthcare or government) to differentiate from CoreWeave on compliance credibility.
See more alternatives to ONNX Runtime →
See more alternatives to Lambda Labs →