LLM Benchmark

Detailed Eval Results

How well do leading LLMs write SpacetimeDB code? We prompt each model, run the generated code against live modules, and score with automated checks.

Leaderboard

Models ranked by eval pass rate.
ModelEval Pass%ChecksCostRun Time
No results available.