Skip to content

PromptCanary vs promptfoo

Both tools help you test LLM outputs. They solve different problems.

Quick comparison table

FeaturePromptCanarypromptfoo
ArchitectureTesting library (npm package)Evaluation platform (CLI + web UI + DB)
Test runnerUses yours (Vitest, Jest, Mocha)Has its own (promptfoo eval)
Config formatTypeScript test files (or optional YAML)YAML config (promptfooconfig.yaml)
Install size~22 KB (13 files)~21 MB (349 files, 84 dependencies)
InfrastructureNone — pure librarySQLite DB, optional web UI server
Provider supportOpenAI, Anthropic, Google Gemini60+ providers built-in
Assertions7 built-in + custom via test runner30+ built-in + model-graded (LLM-as-judge)
Red teamingNoYes — 50+ vulnerability types, OWASP/NIST compliance
Web UINoYes — local viewer + enterprise cloud
Semantic similarityYes (embedding-based)Yes (embedding-based)
CI/CDRuns wherever your tests runDedicated CI integrations
Enterprise featuresNoRBAC, SSO, team sharing, audit logs
PriceFree (MIT)Free OSS + paid enterprise

When to use PromptCanary

  • You already have a test suite (Vitest, Jest, etc.) and want to add LLM coverage
  • You want prompt tests that look and feel like your unit tests
  • You need zero infrastructure — no database, no server, no new tools
  • You test against 1-3 providers and need basic assertions + semantic similarity
  • You value simplicity and small dependency footprint

Example:

typescript
import { testPrompt, assertions } from 'promptcanary';

it('refund policy mentions 30-day window', async () => {
  const result = await testPrompt({
    provider: 'openai',
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: 'What is the refund policy?' }],
  });
  expect(assertions.contains(result.content, '30 days').passed).toBe(true);
});

When to use promptfoo

  • You need to evaluate prompts across many providers simultaneously (matrix testing)
  • You need model-graded assertions (LLM-as-judge, factuality scoring, RAG metrics)
  • You need red teaming and security compliance scanning (OWASP, NIST, HIPAA)
  • You want a web UI for exploring and sharing results across a team
  • You need enterprise features (RBAC, SSO, audit trails)
  • You're building evaluation datasets or doing systematic prompt optimization

Can you use both?

Yes. PromptCanary handles prompt regression tests in your test suite (run on every PR). promptfoo handles deeper evaluation and red teaming (run periodically or during prompt development). They don't conflict.

Summary

promptfoo is a full evaluation platform for teams doing systematic LLM evaluation, red teaming, and prompt optimization. PromptCanary is a testing library for teams that want prompt regression tests inside their existing test suite. Different tools for different jobs.

Released under the MIT License.