LLM Benchmark

Detailed Eval Results

How well do leading LLMs write SpacetimeDB code? We prompt each model, run the generated code against live modules, and score with automated checks.

Evals

0 tasks · 0 categories