Comet
ML experiment tracking and LLM observability platform, including Opik for evaluating LLM apps.
Comet pushes Opik beyond observability — Test Suites and an auto-fixer turn agent dev into a software discipline
◆Recent moves
- 2d ago
What Held Up at 3 AM: One Engineer’s RAG Case Study
Opening installment of a new interview series with engineers who shipped AI to production — Michael Maximilien on RAG architecture choices. Content marketing, but high-signal because it positions Comet alongside the practitioners customers identify with.
View source ↗ - 7d ago
LLM Cost Tracking Solution: How to Monitor and Control AI Spend in Agentic Systems
Capability-focused content on monitoring multi-step agent spend, framed around the gap between successful demos and surprising invoices. Reinforces LLM FinOps as the next reliability dimension Opik is staking out.
View source ↗ - 1mo ago
Introducing the Opik Agent Playground
New playground surface inside Opik for iterating on built-but-not-yet-good agents. Targets the last mile of agent development where most teams currently rely on ad-hoc tooling.
View source ↗ - 1mo ago
Introducing Ollie: Auto-Fix Your Agent’s Codebase
⚡ SPARKComet launches Ollie, an automated agent-codebase fixer that ports the debugger-test-deploy loop from traditional software into agent development. Bold extension of the Opik platform from observability into automated repair.
View source ↗ - 1mo ago
Introducing Opik Test Suites: Straightforward Unit & Regression Testing for AI Agents
⚡ SPARKOpik adds unit and regression test suites for agents, treating agent quality as something teams should pin down with the same discipline as backend tests. Positions Opik as the testing framework of record for production agents.
View source ↗ - 1mo ago
Multimodal LLM Evaluation: A Developer’s Guide to Multimodal Language Models
Developer guide for evaluating multimodal language models, leaning on Shopify and Waymo deployment examples. Brand-positioning content that lines Comet up alongside the high-volume production references.
View source ↗