Who am I? Who are you? Who am I to you? How do I know you wrote this message? Should I allow you to do this? Simple questions, but they're surprisingly deep. In the world of computer systems we need to answer them in a precise and provable way. Identity and authentication systems have a very long and complex history. Too long for a single blog post.
Fortunately, the galaxy brain folks at the OpenID Foundation have devised several genius standardized protocols, which are used by billions of users, for precisely answering these questions in distributed computer systems. The specifications for these 200 IQ protocols are, however, hard to follow because of their formalism, the need for backwards compatibility, and the nuance of the ideas that they outline. In this blog post, I'm going to explain these ideas by way of analogy to concepts and ideas which are familiar to everyone, namely driver's licenses, car keys, cars, and state issued IDs. I'm also going to explain how SpacetimeDB builds on these protocols to implement identities and authentication.
TL;DR: OpenID is like showing someone your driver's license (now they know who you are), and OAuth 2.0 is like giving someone your valet key (they can drive your car but don't know who you are).
Identity
What is an identity? The Cambridge Dictionary defines "identity" as "a person's name and other facts about who they are". This simple definition is an extremely insightful and useful one. The one amendment I would make is that an identity doesn't have to apply to a person, it can also be applied to a group, a thing, or even an idea. Let's use "entity" to refer to anything that can be identified. (Perhaps not coincidentally both "identity" and "entity" are etymologically related to the Latin verb "esse", meaning "to be", which is also the basis of the word "essence".)
A key component of this definition is that an identity is not just an entity's name, but rather a set of facts about the entity, where their name is just one among many possible facts. It's helpful, but not necessary per se, that these are distinguishing facts, meaning that they are facts which are not shared by any other entity. A single entity may have many identities, which is to say multiple sets of facts about it. For example, in my case, I may identify myself as the founder of Clockwork Labs, but I also might identify myself as a 33 year old software engineer. Or as another example, my driver's license makes claims about my date of birth, my appearance, my height, my weight, my eye color, etc. My driver's license represents a set of claims about me. It is one identity of mine.
Identifiers
Identities are not synonymous with identifiers. Identifiers are a single uniquely distinguishing fact about a person, idea, group, or thing. They are a single fact, so they are a component of an identity. If an identifier applies to more than one entity, it loses its capacity to uniquely identify an entity. This uniqueness constraint is what makes them useful and is the defining quality of identifiers.
This uniqueness is achieved by constructing identifiers to be numbers or strings of characters which are associated with a single entity in such a way that the fact will never be applied to any other thing. This can either be done by choosing a sufficiently large random identifier so that the odds of assigning the same identifier twice are effectively zero, or by having a centralized authority assign identities while keeping track of all identities previously assigned to ensure the same identity is never issued to two different entities twice. If you need multiple authorities to assign identifiers, then you can recover the uniqueness property by including the identifier of the issuing authority in the assigned identifier, thereby ensuring that no two authorities issue the same identifier.
For example, in the United States, every driver's license is assigned a unique license number. This license number is unique within the state, but is not necessarily unique across states. No two drivers in the State of Texas are assigned the same license number. Therefore, within Texas we can use a license number to uniquely identify a particular Texas driver. The same does not apply across states, however we can construct a unique identifier for a driver in the United States by combining an identifier for the State of Texas with the license number. e.g. TX:12345678
.
Trust & Authentication
When someone states a fact about their own identity or someone else's, should you trust them? Well it depends. Are they reputable? Do they have an incentive to lie to you? Could they be trying to masquerade as someone else? Do their claims match their appearance? Can you even see them or are you relying on a document they sent you? Was the document tampered with?
In computer systems all claims of identity come to you in the form of a message sent over the internet. You need some assurances as to who sent the message and that the message has not been tampered with. Even if you have both of those assurances, you need to decide whether or not the sender is trustworthy when it comes to the claims they are making.
A small subset of claims can be proven without trusting anyone; they can be proven mathematically, while others cannot. Fortunately it is possible to mathematically "sign" messages so that we can know that they could only have been written by an entity who has access to a particular secret number in the world of bits (sadly, we can never know for sure that the world-of-bits entity corresponds to a particular real-world person however). Likewise, fortunately, we can use math to know that a message was not tampered with after it was written and signed by the original creator.
Most claims we might find in a message cannot be proven mathematically. For those claims, we're going to have to trust someone. For example, claims about the real world, the world of atoms, require that you trust that a real-world person has verified that those claims are true in the world of atoms. The only way you can know that my hair is brown is by looking at it yourself or by hearing it from someone you trust who has seen it themselves.
Let's go back to the license example. A driver's license represents a set of facts about the driver, including their birth date, name, address, weight, eye color, etc, which are ostensibly verified in the real-world by a trustworthy state prior to being issued to a driver. The driver's license as an identifying document is useful only insofar as you trust that the document is genuine and that the issuing state has accurately recorded the facts. These facts are not infallible, they may not be accurate. You might have bribed someone at the DMV, or as many 20 year old Americans with a fake ID know, identities can be counterfeited or simply fabricated, or the DMV might accidentally not check your eyesight before issuing the license.
If someone hands you a license, before you trust its claims, you have to evaluate:
- A. whether it's from an actual state that you recognize and trust (not the made up State of Tylerland or something)
- B. whether it's been genuinely issued by the state
- C. whether it hasn't been edited after issuance
- D. whether it's not expired
- E. whether the state accurately verified the facts about the driver in the real world
- F. whether the person giving it to you is the person it was issued to (typically the person in possession of a license is the person it was issued to, but licenses can be stolen as with anything else)
If all of that checks out, you might feel confident in relying on those facts. Or as we say in the computer biz, you've authenticated the identity of the license holder.
In the case of a "digital license" in the world of bits, B and C are the only ones we can verify with mathematics. D requires an accurate clock. A and E come down to trusting the issuing state, and F you just gotta hope that the sender of the message didn't steal the message, although we can at least use math to ensure the message is transmitted to the receiver secretly without it being stolen en route.
Once you've authenticated someone's identity (e.g. you've established that someone's driver's license is real and trustworthy), you can now decide based on the facts of that identity if you want to allow them to do something. For example, if you're a bouncer at a bar you'd use the birth date to decide if they're over 21 and allowed to enter or if you're a police officer who pulled them over you might check to see if they have a warrant for their arrest.
The Problem With Passwords
Most of the time when you ask for a stranger's identity you can just use it without needing a license to prove anything about them. Them telling you their name and information about themselves is probably fine. Either they don't really have a reason to lie to you or it doesn't really matter. If they give you a fake name, then that's what you'll call them. Most websites work this way. Your site talks to someone who claims to be a new user. You let them create a username and a password and then you have a reasonable degree of confidence that whoever is able to provide the password is probably the same person you were talking to before.
For a long time on the internet, this direct method of authentication was good enough. Most services don't need to trust anything about your real-world identity to work. A username was good enough as an identifier, and a password was good enough to authenticate the user. It works, but it's kind of like giving a different name to every restaurant host who asks to put your name down for a reservation. It could get confusing after a while with so many pseudonyms. With more and more services on the web it became clear over time that usernames and passwords have a lot of drawbacks. Namely:
- The user needs a new username/password for every site they sign up to. It's hard to remember 50 username/password combinations.
- If a user reuses a password, then that password is only as secure as the site with the weakest security.
- The user has to do everything themselves. There's no way for one service to talk to another service on a user's behalf without the user exposing their password. For example, as a user, I can't let Calendly make changes to my Google Calendar without giving Calendly my Google password.
OpenID
The original OpenID and OpenID 2.0 protocols and the newer OpenID Connect (OIDC) protocol are internet protocols which are designed to allow a Relying Party (AKA Client, e.g. a web service) to authenticate an End User (e.g. you) via an Identity Provider (e.g. Google) that the Relying Party trusts to authenticate the user on its behalf. The idea being that the user only needs to have one username/password with Google and then it can use that to sign in to any website that supports signing in with Google. These protocols form the basis of the "Sign In With Google" buttons you've probably seen on websites.
Let's translate this into our driver's license analogy. In this analogy, the OpenID Connect protocol is just a specification for what a driver's license should look like and how it should be shown to someone. The user is the driver. Google's role is analogous to the State of Texas' role. They're the DMV. They are the ones who are verifying the information about the driver and then issuing an authentic, signed driver's license to the driver.
Now imagine the driver shows their license to a bouncer at a bar to prove that they are over 21. The bouncer's role is that of the Relying Party or web service. They are going to look at the license and decide whether they trust the issuer, whether they think the license looks legit, and whether they want to let the driver into the bar. The actual process of "showing" the license to the bouncer is complicated in the OIDC flows, but essentially that's really all OpenID Connect does. It solves problems #1 and #2 above. Now the user only needs one username and password and they can choose the service whose security they trust the most. It doesn't solve problem #3 though. For that we need a standardized way of delegating authority on a user's behalf to third parties. This is where OAuth 2.0 comes in.
OAuth 2.0 and Authorization
OAuth is short for Open Authorization (not authentication). Whereas, authentication is the process of verifying what someone's identity is, authorization is deciding whether to allow someone to do something. They are often used together, but you can have one without the other. For example, a police officer might want to authenticate someone just for information gathering purposes and not as part of authorizing them to do anything. Likewise you might authorize someone to valet your car by giving them your valet key. You don't have any idea who they are (except ideally that they work for the restaurant), but you are allowing them (and anyone else they might give your key to) to drive your car and only to drive your car.
OAuth was developed specifically to allow users to let services like Calendly put items on their Google Calendar or let FarmVille send invites to all their Facebook friends. The idea was to let users hand out a specific "valet key" which would let a service like Calendly take a specific action on behalf of the user without Calendly being able to impersonate the user or having the user hand over their Google password to Calendly.
Just as OpenID is a standard for driver's licenses, OAuth is a standard for valet keys. It specifies what a valet key is, what shape it needs to be, and how they work. The OAuth and OAuth 2.0 protocols don't say anything at all about identities and authentication. Just as a valet key itself tells you nothing about who the driver is, an OAuth 2.0 token does not include any information about the user is or even who gave it to you.
OpenID Connect & OAuth 2.0
Interestingly, because the web flows and user experience for OAuth 2.0 were so well developed, even though early versions of OpenID predated early versions of OAuth, developers of third party apps began to use OAuth 2.0 for identification/authentication purposes. How does that work if OAuth is just a valet key? Well although a valet key itself tells you nothing about the driver, you can use a valet key to rummage around in the glovebox of the car looking for clues as to who the owner might be! You might find candy wrappers, notes, or ideally the car registration.
Developers (the valet driver) ended up using Access Tokens (valet keys) to ask the authorization server (the car) to tell them information about the user (the driver). This strategy sort of works, but it's also unstandardized and ad hoc for every authorization server. Given the popularity of OAuth 2.0, the OpenID Foundation decided to develop OpenID Connect, an identity protocol which builds upon the foundations of the OAuth 2.0 protocol and uses the same JWT structure and webflows.
Whereas OAuth 2.0 tokens are not required to provide information about the user (and often do not), OpenID Connect ID Tokens provide a standard set of claims (like a driver's license) about the user. By the way, the OIDC specification also recommends the implementation of a /userinfo
route that anyone with the OAuth 2.0 Access Token can access which returns the information in the ID token. This is kind of like recommending a copy of the license be taped to the inside of the glovebox, so anyone with the valet key can find out the identity of the user.
SpacetimeDB
How does this all relate to SpacetimeDB? SpacetimeDB implements the OpenID Connect protocol as a "Relying Party" (AKA "Client"). A "Relying Party" in the OIDC spec corresponds, in our analogy, to either the valet driver or the bouncer looking to identify the person that's trying to enter the bar. SpacetimeDB authenticates the license of everyone who wants to interact with your module (by analogy, enter your bar). SpacetimeDB only requires that a license have two claims: the state that issued the license and the license number. These are verified by SpacetimeDB mathematically and they're also used to compute the SpacetimeDB Identity
.
Note
Side note: The SpacetimeDB
Identity
should not be called anIdentity
it should be called anIdentifier
because although it is an identity it's a unique, minimal identity and so is more accurately called anIdentifier
. We ran out of time to make this change before 1.0. We may eventually deprecate this name and replace it withIdentifier
orId
in future major versions. I'm sorry, Nat Sakimura, I let you down.
If the "driver's license" includes additional claims, SpacetimeDB ensures that every claim is validated before delivering it to your module. Crucially, once the claims are authenticated, it's up to you to decide what you want to allow someone with that identity to do. For example, you might not allow anyone to interact with your module who doesn't have a Google account with a first and last name. You can write whatever Turing Complete authorization rules you want inside your module.
Note
Side note: Currently SpacetimeDB only exposes the
Identity
claim to modules which is calculated from thesub
andiss
claims of the token/license. We plan to allow modules to access all the other claims on the license in the near future.
NOTE! It is important that you do not confuse SpacetimeDB as being the "End User" (the driver or user) or the "Authorization Server" (the car or Google's APIs) or the "Identity Provider" (the state DMV or Google). SpacetimeDB is the "Relying Party" (sometimes a bouncer if looking at the license, sometimes a valet if accessing your car, always your app). And it is very important that you do not confuse the "Authorization Server" (the car or Google's APIs) with the "Relying Party" (the bouncer or your app). The car is registered with the state and the DMV! In many cases in OIDC the car and the DMV are the same service (if you want to stretch the analogy just imagine the valet driving your car to the DMV to look up your identity I guess). You will eventually be confused by this because both the "Authorization Server" (the car) and the "Relying Party" (the bouncer) are both sort of doing authorization. I know because I've confused myself many times. I'm writing this as much for my future self as I am for you all.
SpacetimeDB is NOT authenticating end users directly, it's relying on third parties to have done so and it is using the authenticated ID Token (the license) as proof that the user has been authenticated by the DMV and "handed" this license. Now here we arrive at a very tricky detail indeed. In practical reality this is no way to "hand" the end user the license directly. In fact, they never see it! When you "Sign In With Google" surely you have never seen an ID Token directly and you certainly never gave it to any service yourself. In fact, the license is instead given directly from the Identity Provider (DMV) to the Relying Party (the bouncer). This license is specific to the particular Relying Party (the bouncer). In fact, it has the bouncer's name written on it in the form of the audience (aud
) claim. When you as an end user check the "Let application X see my email address" on the Google page, you are authorizing Google to issue an ID Token to X with X's name on it. If this were not the case, then the bouncer, being in possession of the license, could impersonate the user to the bar next door. If any bouncer receives a license that doesn't have their own name on it, it's an invalid license and should not be trusted.
And here we arrive at one more important but confusing detail. With SpacetimeDB and similar serverless/multitenant offerings there are actually two bouncers: both SpacetimeDB and your application or module. Just to make it less confusing, let's call SpacetimeDB the bouncer and your app the bartender. SpacetimeDB is going to check IDs at the door and put a little band on the user's wrist if their ID checks out. As the bartender you have to trust that SpacetimeDB is not lying to you and giving wristbands to 17 year olds. If you are operating open source SpacetimeDB then you know what code you're running, but if you're using a SpacetimeDB provider (e.g. our company), you should only do so if you trust them to not lie to you. The wristband analogy also undersells the trust a bit because in reality the SpacetimeDB provider could just make up any claims it wants about the person entering the bar. But all of that is moot anyway because SpacetimeDB is running your whole application and can see and arbitrarily mess with its data storage and execution, so ehhh... use reputable cloud platforms kids.
Note
Note that you should also choose your DMV wisely because they can impersonate anyone.
Also I'd like to point out that depending on the auth flow, the Identity Provider (DMV) can either hand the license to your app's client or to SpacetimeDB directly. In the case where it's your client app, you would then hand it to SpacetimeDB, which hands it to your module. The verification step is done by SpacetimeDB so your client app can't lie to SpacetimeDB.
Going to the DMV
SpacetimeDB currently doesn't involve itself with any OIDC auth flows (although it does use the standard OIDC routes to check the validity of ID Tokens it receives as a Relying Party.) So as of today if you want to use an existing OIDC provider, I'm afraid you'll be going to the DMV yourself.
The good news is that you can use one of many off-the-shelf OIDC client implementations to help implement the flows and allow your users to sign in. This either involves implementing your own server if you want to use the Authorization Code Flow or if you'd rather not deploy a server, you could use the Implicit Flow or Authorization Code + PKCE Flow using the Authorization Code + PKCE Flow. The Authorization Code + PKCE Flow is generally recommended over the Implicit Flow, although the Implicit Flow is easier to wrap your head around.
We will be releasing documentation shortly with instructions on how to implement these flows for your SpacetimeDB module. In the future, we also aim to implement better support for these flows directly to simply the process of setting this up. In principle, we could implement tools to make this easier in both SpacetimeDB and in our client.
Setting Up Your Own DMV
While the above strategy is an option, you may want to issue your own identities rather than relying on a third party Identity Provider like Google. Why would you want to do that? Several reasons:
- You might want to restrict people who interact with your application to users who first signed up on your website
- You might want to verify real-world claims yourself before issuing an identity
- You might not want to trust third party identity providers to verify claims in general
- You might want to associate identities from multiple third party identity providers with a single identity or identifier
Note
NOTE: The first and last ones are the most common use case. For the last one you can always associate multiple identities to a single user within your module, but that consolidated user identity can only be used within your module, not between modules since you can't mint new identity tokens from your module.
So if you want to have full control over your authentication process, we've got a simple solution for you. You'll need to register your own domain name and set up a server configured with TLS. No need to involve existing OIDC providers unless you want to allow it. This is sort of like setting up your own state/DMV for issuing your own driver's licenses. You can either give the users their own driving test or you can accept licenses from other states as proof that the user can drive. If the driver passes your tests then you issue your own licenses which SpacetimeDB is capable of verifying via your server.
You can set up your own SpacetimeDB identity provider with the following steps:
- Register a domain name. (e.g. foobar.com).
- Deploy a webserver at that domain name.
- Implement email/password sign in on that server, just like you would for a normal webserver (or if you want to Sign In With Google, etc.).
- Sign a JSON Web Token (JWT) with this structure:
{
"iss": "https://foobar.com",
"sub": "<user-id>",
"exp": <expiration-time>
}
where <user-id>
is whatever user ID you'd like to assign to your user and <expiration-time>
is when you'd like the token to expire.
- Sign the key with a private key/public key pair.
- Host an OpenID Connect configuration file at
https://foobar.com/.well-known/openid-configuration
which points to the public Json Web Key (JWK) which you used to sign the token. You can see SpacetimeDB's config file here. - Issue the tokens to your client app once the user signs in.
SpacetimeDB will then authenticate the user with this token, using your keys to verify that your webserver signed it. The SpacetimeDB Identity
of the user will be calculated from the iss
and sub
claims with the following algorithm (pseudocode):
def identity_from_claims(issuer: str, subject: str) -> [u8; 32]:
hash1: [u8; 32] = blake3_hash(issuer + "|" + subject)
id_hash: [u8; 26] = hash1[:26]
checksum_hash: [u8; 32] = blake3_hash([
0xC2,
0x00,
*id_hash
])
identity_big_endian_bytes: [u8; 32] = [
0xC2,
0x00,
*checksum_hash[:4],
*id_hash
]
return identity_big_endian_bytes
SpacetimeAuth (and Auth0 and FirebaseAuth and Clerk and others)
Now finally we come to auth platforms like Auth0, FirebaseAuth, and Clerk. What the heck do these platforms do? Where do they fit into everything we've learned about above?
Auth platforms are sort of like a DMV creator service. A DMV factory if you will. They will host your DMV inside their big warehouse of DMVs. If you prefer a different analogy, they're more like the US Federal Government or the European Union, you can apply to create your own state (complete with your own DMV) under their umbrella. These platforms will allow you to have your own issuer underneath their domain. So whereas your issuer might have been https://your-website.com
if you issued your own licenses directly, the licenses issued by these platforms will be something like https://your-website.auth0.com
.
Auth platforms typically provide other tools like user management, account recovery, and various admin tools as well. For example, if your user forgets their password and needs a password reset, they will send an email to the user on your behalf which is branded with your brand, so the user can recover their password through the auth providers website. That way you don't have to build all your own user management, account recovery, etc. tools.
It's a very helpful service!
So helpful in fact that we think it's one of the most critical things missing from our platform that will help new developers get onboarded quickly without having to read OIDC specs or spin up their own servers. That's why we are going to be introducing SpacetimeAuth, our own identity provider service.
The dream of SpacetimeDB is that devs can deploy an application into production without ever needing to think about managing servers. We're so close! SpacetimeAuth is the final step. It'll give you the ability to issue your own identities to users without needing to host anything. We'll host your own identity provider for you.
SpacetimeDB as a (limited) DMV
There is one tiny missing piece to our puzzle however. Many of you have already used SpacetimeDB on your localhost
and completed the tutorial without having to deal with anything other than a token that SpacetimeDB gave you. Which raises a few questions. Who created that token? Is it an OIDC token? Where is it valid?
The answer is that your local SpacetimeDB created that token for you. For users connecting to SpacetimeDB anonymously or at your request for a new identity, SpacetimeDB will create and issue a new bearer token for you or your user. The token's iss
claim is set to localhost
. When SpacetimeDB receives a token with a localhost
issuer it is an indication that this token was created by that SpacetimeDB instance for that SpacetimeDB instance. SpacetimeDB will attempt to use its own public key to verify that it was the one who originally issued the token and assume that the bearer of the token is the person it was issued to.
localhost
tokens are convenient, they have several downsides, namely:
- There is no way to recover an identity token if you lose it. SpacetimeDB has no way of verifying that someone requesting reissuance is the original person they issued the token to.
- Because of #1 they do not expire, so they're more like a random password than a token.
- They are not portable. A token issued by one SpacetimeDB instance has no way of verifying a token issued by a different SpacetimeDB instance. This makes it harder for you to move a database from your local instance to Maincloud and vice-versa since all your users using these tokens will no longer be valid for Maincloud.
For those reasons we discourage their use outside of development and dev environments, except as temporary anonymous session tokens which are used for a single session and then discarded. As a SpacetimeDB app developer you should be deliberately issuing your users their own identities with which to connect to SpacetimeDB.
Future Work
Database identities
At present, database identities (the identity of the databases you deploy) are issued by the hosting cluster as localhost
identities. Database identities are not currently used for anything except to allow you to reference a particular database on that instance. In the future, however, we plan to allow databases to message each other via inter-module communication (IMC). IMC is a messaging protocol that will allow databases to be clients of other databases (future blog post planned!)
These databases will need to be able to authenticate to each other in these messages. Because database identities are localhost
issued identities, databases can only use these identities to authenticate to other databases within the same cluster. In the future, we plan to issue database identities which will be valid and verifiable inter-cluster, allowing databases to send messages between clusters. That way your local SpacetimeDB databases will be able to communicate to Maincloud databases via IMC with no additional configuration.
Session Tokens
Typically OIDC tokens expire after a relatively short amount of time. This is because once they're issued they cannot be easily revoked. Whoever is in possession of them can present them to your backend and impersonate the user. For this reason, they're primarily useful as a way to establish an initial authentication of the user. This means that unless you issue a longer lived token, your users will keep having to go through the login flow whenever they want to sign into your app.
In the near future, we plan to implement SpacetimeDB session tokens and session management. A session is established with a short lived ID token just as before, but once authentication has been established SpacetimeDB will create a session record and issue a session token which can be longer lived and potentially refreshed. The benefit of storing a session record in the database is that if those session credentials are compromised, the user can authenticate with SpacetimeDB and choose to end the session early, thus revoking all the session tokens for that session.
This is more secure than long lived ID tokens.
Authorization Scopes
While SpacetimeDB uses OIDC which is built on OAuth 2.0, we don't have additional tooling for helping you deal with authorization scopes. In the near future when we expose the auth scope claims, you'll be able to read them within your module, but this does not extend to the subscription API. We plan to build tools into both SpacetimeDB and SpacetimeAuth to make dealing with OAuth 2.0 style authorization much easier.
Who am I?
Ultimately I wrote this blog post for myself as much as for any of you. With all the jargon floating around in OAuth 2.0 and OpenID Connect, I find it very helpful to use real-world analogies for all of this stuff. After all, identification and authenticating people is not new, it being a digital process is the new part.
A huge shout out to the genius and dedicated individuals who created these protocols. They've done amazing work to get us here, and when it comes to security and authentication, the details matter. They've done right by all of us.
If you're looking to deploy SpacetimeDB at your company or startup, give me a shout! Or if you just want to stay up to date on the improvements, follow me on Twitter or Bluesky.