Check out our all new pricing plans! Learn more.

LLM Benchmark

We gave Claude two backends.
One shipped, one debugged.

Same AI. Same prompts. Same real-time chat app. One built on SpacetimeDB, one on a Postgres stack.

Sequential upgrade benchmark
SpacetimeDBvs
3.9×
fewer bugs
3.5vs13.5
31%
lower cost
$12.98vs$18.74
46%
less backend code
777vs1,451
34%
faster build
55 minvs83 min
The Setup

What we built and how

A real-time chat app, built one feature at a time, by the same AI, against two different backends. Here's what that actually means.

The App
A real-time chat web app

Each feature layers on top of the last. By the final level, the app has to keep presence, threads, private rooms, and drafts all in sync across clients in real time.

The Process
Build, test, fix, repeat

Claude works on one feature at a time and can't move on until it passes. Bugs are tracked, fix iterations are counted, and cost is tallied as it goes.

1
Claude builds the next feature
2
We test it against the spec
3
Claude fixes any bugs found
4
Move to the next feature
The Backends
A database vs. a stack

Both backends have to deliver the same real-time features to the same React client. They just get there differently.

SpacetimeDB
database + reducers
Postgres stack
Postgres + Express + Socket.io + Drizzle
Cost Over Time

Postgres costs compound. SpacetimeDB stays linear.

SpacetimeDB costs scale linearly with features added. Postgres doesn't. The gap widens as features interact.

$19$15$11$7$3L1L2L3L4L5L6L7L8L9L10L11L12
SpacetimeDBPostgresRun 1 (dashed)Run 2 (solid)
The Repair Tax

Postgres burns 6× more on fixes

The Postgres stack has more wiring to get right. Every missed emit is a bug. Every bug is another fix loop. Every fix loop eats the budget meant for new features.

Postgres
38%
of spend on fixes
$7.09 fixes$18.74 total
62% building features
38% fixes
SpacetimeDB
9%
of spend on fixes
$1.14 fixes$12.98 total
91% building features

For every $1 SpacetimeDB spent on repairs, Postgres spent $6.22.

Bug Distribution

New features kept breaking Postgres

SpacetimeDB shipped twice as many features bug-free. Postgres bugs clustered where features had to interact with each other.

123L1ChatL2ScheduledL3EphemeralL4ReactionsL5Edit Hist.L6PermsL7PresenceL8ThreadsL9Private/DML10ActivityL11DraftsL12Migration
SpacetimeDBPostgres
Quality Analysis

Most Postgres bugs were real-time failures

One in three Postgres bugs was state failing to sync across clients. SpacetimeDB's subscription model makes that entire category impossible to produce.

SpacetimeDB
7 total bugs across 2 runs
3.5
bugs / run
Real-time state not updating2 bugs
SDK API misuse1 bug
Logic / other4 bugs
Postgres
27 total bugs across 2 runs
13.5
bugs / run
Real-time state not updating9 bugs
Missing UI element5 bugs
Data not persisted5 bugs
Logic / other8 bugs
Backend Code

46% less backend. Zero wiring.

Declarative tables and reducers replace the Express, Socket.io, and Drizzle scaffolding Postgres requires. Fewer moving parts means fewer places for the AI to miss a connection.

Postgres
1,451
SpacetimeDB
777
SpacetimeDB backend is 46% smaller
Postgres Backend
~1,451 lines
  • SQL schema migrations (Drizzle)
  • Express REST endpoints per feature
  • Manual Socket.io room management
  • Per-event emit calls; miss one, break real-time
  • Auth middleware wired per endpoint
SpacetimeDB Backend
~777 lines
  • +Declare tables as structs
  • +Write reducers (functions that mutate state)
  • +Clients subscribe to queries; updates are automatic
  • +No WebSocket emit boilerplate
  • +No SQL query strings
Why this matters for AI: One in three Postgres bugs was real-time state failing to sync across clients. SpacetimeDB's subscription model makes this class of error structurally impossible.
Runtime Performance

The AI-generated code runs faster too

Same chat app, same hardware, both pushed to peak throughput. Raw and optimized. SpacetimeDB leads in both.

As-shippedmsgs/sec
SpacetimeDB
5.3k
Postgres
694
7.6×
AI-optimized
SpacetimeDB
25.3k
Postgres
1.1k
22×
Why the difference: SpacetimeDB processes each message in a single in-process transaction. The Postgres app serializes multiple sequential network round-trips per message. Optimization helps both, but it can't eliminate network physics.
Head-to-Head Results

The numbers, side by side

12 feature levels. Same AI model. Same prompts. Same app requirements. Figures averaged across 2 runs. The only variable was the backend.

MetricSpacetimeDBPostgres
Total AI cost to buildAveraged across 2 runs$12.98$18.74
Features working first tryNo fix iterations needed75%46%
Bugs found per runAveraged across 2 runs3.513.5
Fix iterations per runRepair loops required2.513.5
Cost spent on fixesShare of total budget on repairs$1.14 (9%)$7.09 (38%)
Total lines of codeAI-generated, client + server, excl. CSS2,3043,288
Backend lines of codeAI-generated, server-side only7771,451
LLM API calls per runTotal prompts sent~395~666
Total build timeWall-clock, averaged~55 min~83 min
Methodology

How we ran this benchmark

Model
Claude Sonnet 4.6 for both backends, both runs
Task
Build a real-time chat app via sequential upgrade: 12 feature levels added one at a time
Controls
Identical prompts and bug-fix steps. Each backend received tailored setup guidelines.
Measurement
Cost tracked via OpenTelemetry instrumentation of the Claude API. All figures averaged across 2 runs. LOC counts exclude CSS and generated bindings.
Grading
Each level was manually tested and graded after generation. Bugs were identified through functional testing of the running app, then fixed via a structured fix prompt before proceeding to the next level.
Runtime
Peak saturated throughput measured over 30 seconds with 20 concurrent writers on the same dev machine. Writers fire as fast as each backend can accept. Two tiers: as-shipped AI output and one-pass AI optimization applied to both. All features preserved across tiers.

The backend you choose determines whether your AI ships features or debugs them.

SpacetimeDB is a database with real-time subscriptions and server-side logic built in. No WebSocket glue, no ORM, no event routing layer.