← Back to all sparks
A

AssemblyAI

COMMS
Velocity6.3

AssemblyAI ships a full voice-agent pipeline and a multi-LLM gateway, moving past speech-to-text.

voice aispeech to textvoice agentsllm gatewaymodel fallbacksvertical accuracy
Current state
AssemblyAI's recent shipping is dominated by two themes. The first is the new Voice Agent API — a complete pipeline (speech understanding, LLM reasoning, voice generation) over a single WebSocket at a flat $4.50/hour, running on Universal-3 Pro Streaming. The second is the LLM Gateway maturing as a hosted multi-LLM proxy: Claude Opus 4.7 is now available through it, and automatic model fallbacks landed in public beta. Around these, smaller releases include same-request unredacted transcripts on PII Redaction, Universal-2 accuracy improvements for Hebrew and Swedish, and a Medical Mode add-on for streaming transcription.
Where it's heading
AssemblyAI is repositioning from 'best-in-class speech-to-text' to 'end-to-end voice AI platform'. The Voice Agent API directly takes on Vapi, Retell, and the OpenAI Realtime API, and the bundled flat-rate pricing is a deliberate simplification — customers no longer need to track three meters across STT, LLM, and TTS providers. The LLM Gateway evolution rounds out the same idea: AssemblyAI as the vendor you call, with model variety and resilience handled inside.
Prediction
Expect the Voice Agent API to gain richer agent-tooling primitives — function-calling, knowledge-base retrieval, latency tuning — and the LLM Gateway to add more frontier models and policy-routing. Vertical-specialized modes like Medical Mode are likely to expand to legal, finance, and customer support.

Recent moves

  1. 1mo ago

    LLM Gateway: JSON Repair Post-Processing for Structured Output

  2. 1mo ago

    Streaming Speaker Diarization: Major Accuracy Upgrade with Per-Word Labels

  3. 1mo ago

    Introducing the Voice Agent API

    ⚡ SPARK

    The Voice Agent API delivers a full speech-to-LLM-to-speech pipeline behind a single WebSocket at a flat $4.50/hour. This is AssemblyAI moving from STT vendor to end-to-end voice AI platform.

  4. 2mo ago

    Voice Agent API (republish)

    Republished version of the Voice Agent API announcement — same content as the dated entry.

    View source ↗
  5. 2mo ago

    PII Redaction: Return Unredacted Transcripts in the Same Request

    A new redact_pii_return_unredacted flag lets a single PII Redaction request return both versions in one response — sensible for compliance pipelines that need both the redacted output and the original audit copy without double-billing the call.

  6. 2mo ago

    PII Redaction unredacted output (republish)

    Republished PII Redaction announcement — same content as the dated entry.

    View source ↗