← Back to all sparks
D

Deepgram

COMMS
Velocity6.3

Deepgram pairs a real diarization quality jump with voice-agent platform breadth.

speech-to-textvoice-agentsmodel-upgradesmultilingualdiarizationself-hosted
Current state
Deepgram is shipping on two tracks at once. The speech-recognition core is getting model-quality work — diarization v2 is the headline, with profanity filtering and numerals expanding across long tails of languages. In parallel, the Voice Agent API is being built out as a multi-vendor orchestration layer, with managed Gemini, GPT, and Cartesia options sitting next to Deepgram's own Aura-2 TTS and Flux ASR.
Where it's heading
The arc is two products converging: a best-in-class speech stack and an opinionated voice-agent runtime that abstracts the LLM/TTS choice. Diarization v2 — preferred 3.3× over v1 in human eval, with ~80% median CER reduction on contact-center audio — is the kind of underlying model win that pulls call-center workloads onto the platform. Meanwhile, runtime controls like Aura-2 speed and pronunciation, plus managed third-party LLMs, position Deepgram as a single integration target rather than a single component vendor.
Prediction
Expect Diarization v2 to become the default behind diarize=true once the opt-in window closes, and expect the Voice Agent API to keep adding tier-priced managed providers — that's the obvious monetization layer. Multilingual feature parity (numerals, profanity, Flux) will continue to fill in tail languages, narrowing the gap between English-only buyers and global deployments.

Recent moves

  1. 1mo ago

    Profanity Filtering Now Supported for All Multilingual Models; Korean Spacing Improvements

  2. 1mo ago

    Gemini 3.1 Flash Lite Now Available

    Gemini 3.1 Flash Lite graduates from preview to GA as a managed Standard-tier LLM in the Voice Agent API. Fits the broader pattern of Deepgram running a curated, versioned roster of third-party brains behind a single agent contract.

  3. 1mo ago

    Numerals Support Now Available for 3 New Languages: Russian, Romanian, and Hebrew (Monolingual Models)

    Numerals support extends to Russian, Romanian, and Hebrew monolingual models. Incremental, but useful for transcription workloads where spoken-number-to-digit conversion is table stakes for downstream parsing.

    View source ↗
  4. 1mo ago

    Profanity Filtering Now Available in 50+ Languages

    Profanity filtering ships across 50+ monolingual languages via a single API parameter. Broad coverage in one move — turns a previously English-centric trust-and-safety feature into something multilingual deployments can rely on uniformly.

  5. 1mo ago

    Self-hosted May release ships Diarization v2 by default

    The May self-hosted bundle (release 260514) brings the new batch diarizer v2 to new on-prem deployments by default. Operationally important — keeps the self-hosted track from drifting behind the cloud capability set, which matters for the contact-center buyers most likely to be self-hosting.

    View source ↗
  6. 1mo ago

    Diarization v2: Improved Batch Speaker Diarization

    ⚡ SPARK

    Batch Diarization v2 lands behind an opt-in diarize_model parameter, with human eval preferring it 3.3× over v1 and median CER dropping ~80% on contact-center audio. This is the model-quality story Deepgram has been building toward — the kind of underlying jump that decides procurement when accuracy is the gating metric.