LLM Benchmark
How well do leading LLMs write SpacetimeDB code? We prompt each model, run the generated code against live modules, and score with automated checks.
Run the benchmark pipeline to start tracking trends. Each run inserts results into the database, and this page groups them by date.