LLM Benchmark

Detailed Eval Results

How well do leading LLMs write SpacetimeDB code? We prompt each model, run the generated code against live modules, and score with automated checks.

Trends

Task pass rate over time, per model.

Click a data point to view that day's leaderboard.