Deploying for Compatibility

January 9, 2024
rpc

tl;dr: When schemas or protocols are changing, it's typical to work across multiple deploys. To reason about it, separate what schema version a component is "writing" and what schema versions a component can "read."

Client-server compatibility comes up frequently. Pretty much all modern web apps are distributed systems, with a client (the Javascript that runs in your browser) and a server, and, being on different machines, they're not deployed atomically. Similarly, if software is a client of a database, any schema (broadly construed) change to the database will affect those clients. Famously, Knight Capital Group lost about $400 million because they re-used a flag.

Deploy	Client Requests	Server Understands
0	v1	v1	Initial State
1	v1	v1, v2
2	v2	v1, v2
3	v2	v2	Final State

Let's work through an HTTP API example first. Let's say we want to migrate from sendMessage({to, from, body}) (v1) to sendMessage({to, from, subject, body}) (v2). In the original version, the client sends requests of the v1 shape, and the server handles v1-shaped requests, and everything is fine. (So far, we are at "Deploy 0" in the diagram and table above.) Next, we must deploy a server that handles both v1 and v2 shapes. ("Deploy 1" in the diagram.) When we deploy this, there will be no clients that send v2 messages. If we deploy a server that only accepts v2, then there might be a v1 client still out there. Next, we deploy the client that sends v2 requests. ("Deploy 2" in the diagram.) After a while, old clients will die out, and we can finish by deploying the server that only accepts v2, and we've done the migration ("Deploy 3").

If you have feature flags available to you, you can use these to collapse deploy 1 and 2 such that the behavioral change of the client switching to v2 is flagged.

There will often be a lot of time between deploy 2 and 3. Depending on the change, there may not be a rush to clean up.

Though the general pattern is agnostic to your RPC choices, how you implement a server that accepts both v1 and v2 will depend on your frameworks. For example, if you use TypeScript, you can mark the new argument (subject) optional, and the implementation will work. If you want to, you can upgrade the field to required in deploy 3.

In general, to facilitate thinking about versioning, you can "split" your thinking between writers and readers. In our example, at Deploy 2, we have clients writing messages with v1 and servers reading messages with either v1 or v2. I learned of "readers' schema" and "writers' schema" from Apache Avro. Avro has the unusual property in its libraries that a schema is always attached to the data. In Avro, you're not supposed to see "(Alice, Bob, Hello World)" without knowing that the schema is "(to, from, body)." Avro has resolution rules for letting a reader expecting one schema read data written by another schema. (See https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html for some discussion on the topic.) This brings us to our next example, databases!

Databases #

Let's say you have a column in your Users table, called preferences and, furthermore, this column is of type text or string, but you place JSON data in it. (I think you shouldn't re-invent schema management, and you should use your database's facilities for schema management, but I also know that this happens.) At first, preferences were simply {timezone: string} (v1), but then they became {timezone: string, mode: 'light' | 'dark' } (v2). And then, they evolved to {timezone: string, mode: 'light' | 'dark', language: string} (v3). (I'm using TypeScript syntax to describe the structure of the JSON.)

When a server reads preferences from the database, it might see users created during the v1 regime, the v2 regime, or the v3 regime. It must support all preference versions from the beginning of time unless you've done a migration to bring them up to some minimum version. Think of the server as being a reader of versions v1, v2, and v3. If we're just introducing v3, we must teach the server first to read v3. Then, after all servers are deployed such that they can read v3, we can allow the server to write v3. Importantly, we can't start writing v3 write away. If you're running two application servers, you can run into the fact that one of them will write v3 and the other of them won't know how to read v3 yet, and, boom, errors!

In terms of strategies, some people do strict validation everywhere. If you do strict validation, and somebody adds a choice to an enumeration, and doesn't follow the multi-step deployment protocol outlined above, your validation may blow up. Maybe you handle that blow up gracefully? Others ignore unknown fields and unfamiliar enum values, and generally use optional values. They may lose out on runtime type safety and not know it.

Consider also rollbacks. If you deploy a new version that writes v3, but you have to roll it back, you will now have data written with v3 in the database. Isn't it nice that the previous deploy (that you're rolling back to) already supports reading this newer data?

To sum up: teach your readers to read the new versions first, and only afterwards enable your writers to write the new version.

Deploying for Compatibility

Databases #

More on the topic of compatibility #