A
AssemblyAI
COMMS
Velocity6.3
AssemblyAI ships a full voice-agent pipeline and a multi-LLM gateway, moving past speech-to-text.
voice aispeech to textvoice agentsllm gatewaymodel fallbacksvertical accuracy
◆Current state
AssemblyAI's recent shipping is dominated by two themes. The first is the new Voice Agent API — a complete pipeline (speech understanding, LLM reasoning, voice generation) over a single WebSocket at a flat $4.50/hour, running on Universal-3 Pro Streaming. The second is the LLM Gateway maturing as a hosted multi-LLM proxy: Claude Opus 4.7 is now available through it, and automatic model fallbacks landed in public beta. Around these, smaller releases include same-request unredacted transcripts on PII Redaction, Universal-2 accuracy improvements for Hebrew and Swedish, and a Medical Mode add-on for streaming transcription.
◆Where it's heading
AssemblyAI is repositioning from 'best-in-class speech-to-text' to 'end-to-end voice AI platform'. The Voice Agent API directly takes on Vapi, Retell, and the OpenAI Realtime API, and the bundled flat-rate pricing is a deliberate simplification — customers no longer need to track three meters across STT, LLM, and TTS providers. The LLM Gateway evolution rounds out the same idea: AssemblyAI as the vendor you call, with model variety and resilience handled inside.
◆Prediction
Expect the Voice Agent API to gain richer agent-tooling primitives — function-calling, knowledge-base retrieval, latency tuning — and the LLM Gateway to add more frontier models and policy-routing. Vertical-specialized modes like Medical Mode are likely to expand to legal, finance, and customer support.
◆Recent moves
- 1mo ago
LLM Gateway: JSON Repair Post-Processing for Structured Output
- 1mo ago
Streaming Speaker Diarization: Major Accuracy Upgrade with Per-Word Labels
- 1mo ago
Introducing the Voice Agent API
⚡ SPARKThe Voice Agent API delivers a full speech-to-LLM-to-speech pipeline behind a single WebSocket at a flat $4.50/hour. This is AssemblyAI moving from STT vendor to end-to-end voice AI platform.
- 2mo ago
Voice Agent API (republish)
Republished version of the Voice Agent API announcement — same content as the dated entry.
View source ↗ - 2mo ago
PII Redaction: Return Unredacted Transcripts in the Same Request
A new redact_pii_return_unredacted flag lets a single PII Redaction request return both versions in one response — sensible for compliance pipelines that need both the redacted output and the original audit copy without double-billing the call.
- 2mo ago
PII Redaction unredacted output (republish)
Republished PII Redaction announcement — same content as the dated entry.
View source ↗