← Back to all sparks
C

Comet

AI-ASSISTANTS
Velocity1.3

ML experiment tracking and LLM observability platform, including Opik for evaluating LLM apps.

Comet pushes Opik beyond observability — Test Suites and an auto-fixer turn agent dev into a software discipline

agent-developmentobservabilityopikagent-testingllm-costauto-fix
Current state
Comet's Opik platform is shipping product expansions at an unusually fast clip — Agent Playground for iteration, Test Suites for regression testing, and Ollie, an automated agent-codebase fixer. The supporting content (RAG case studies, LLM cost tracking, multimodal evaluation guides) reads as evidence for a single thesis: agent development needs the testing, debugging, and observability disciplines that traditional software engineering already has. Two responses to recent npm supply-chain attacks also signal a security-aware posture.
Where it's heading
Opik is being built into the end-to-end IDE for agent development — not just observation but iteration, testing, and automated repair. Comet is racing other agent-ops vendors (Arize, LangSmith, Helicone) to define what 'shipping agents like software' looks like, and the breadth of recent releases suggests they intend to win on surface area. Cost-tracking content signals the next axis: making the agent finance story as legible as the reliability one.
Prediction
Expect Ollie to evolve into a CI-integrated auto-remediation product and Test Suites to support model-version comparison out of the box. A unified 'agent SRE' framing is plausible given the cost, security, and reliability content stacking up, and supply-chain attack responses suggest further security-posture content as a differentiator.

Recent moves

  1. 2d ago

    What Held Up at 3 AM: One Engineer’s RAG Case Study

    Opening installment of a new interview series with engineers who shipped AI to production — Michael Maximilien on RAG architecture choices. Content marketing, but high-signal because it positions Comet alongside the practitioners customers identify with.

    View source ↗
  2. 7d ago

    LLM Cost Tracking Solution: How to Monitor and Control AI Spend in Agentic Systems

    Capability-focused content on monitoring multi-step agent spend, framed around the gap between successful demos and surprising invoices. Reinforces LLM FinOps as the next reliability dimension Opik is staking out.

    View source ↗
  3. 1mo ago

    Introducing the Opik Agent Playground

    New playground surface inside Opik for iterating on built-but-not-yet-good agents. Targets the last mile of agent development where most teams currently rely on ad-hoc tooling.

    View source ↗
  4. 1mo ago

    Introducing Ollie: Auto-Fix Your Agent’s Codebase

    ⚡ SPARK

    Comet launches Ollie, an automated agent-codebase fixer that ports the debugger-test-deploy loop from traditional software into agent development. Bold extension of the Opik platform from observability into automated repair.

    View source ↗
  5. 1mo ago

    Introducing Opik Test Suites: Straightforward Unit & Regression Testing for AI Agents

    ⚡ SPARK

    Opik adds unit and regression test suites for agents, treating agent quality as something teams should pin down with the same discipline as backend tests. Positions Opik as the testing framework of record for production agents.

    View source ↗
  6. 1mo ago

    Multimodal LLM Evaluation: A Developer’s Guide to Multimodal Language Models

    Developer guide for evaluating multimodal language models, leaning on Shopify and Waymo deployment examples. Brand-positioning content that lines Comet up alongside the high-volume production references.

    View source ↗