As a member of the 11Sigma crew, I am currently working with Stoplight.io to build an industry-leading API Design and management platform. Stoplight wanted its users to be able to collaborate on their designs in real-time via Stoplight Studio and my team set off to make this a reality.
The first instance of a collaborative real-time editor was demonstrated by Douglas Engelbart in 1968, in The Mother of All Demos. However, widely available implementations of the concept took decades to appear, and even today, real-time collaboration features are still not as common as one would expect.
My guess is that it would take months to implement it from the ground up. There are SAAS products on the market like Firebase Realtime Database or Pusher, but they come with the risk of sharing data with a 3rd party service.
In this post, I will demonstrate how SAAS solutions can be replaced with a free open-source library.
Why is this so complex?
The main reason is – communication latency. Since the speed of communication is limited by network latency, this creates a fundamental dilemma: users need their edits incorporated into the document instantly, but if they are incorporated instantly, their edits must necessarily be inserted into different versions of the document because of that latency.
So the question becomes – how to keep those versions in sync?
There are several possible approaches, but the most famous ones are: Operational Transformations (OT) and Conflict-Free Replicated Data Types (CRDT).
Operational Transformation (OT) Basics
Operational Transformation (OT) is the most famous technology used in these systems. It was made popular by Google Wave, and it’s still used in Google Docs.
OT keeps an operation log for every change. Change is represented by the operation type (like "insert", "delete"), a value, and the index (position where the change occurred).
For example, if a user writes the word helo
it would result in the following operations:
insert(0, "h");
insert(1, "e");
insert(2, "l");
insert(3, "o");
If these commands are received on the server it will end up with the same result – helo
.
Let’s consider the case where multiple users are writing concurrently. userA changes the text from helo to hello by executing a command insert(3, 'l')
, while, at the same time, userB adds an exclamation mark to the initial text by executing insert(4, '!')
resulting in helo!.
If we don’t do anything with the commands, the outcome will depend on which command is received first:
userB command executed before userA:
insert(4, "!"); // userB command
// result: "helo!"
insert(3, "l"); // userA command
// result: "hello!"
userA command executed before userB:
insert(3, "l"); // userA command
// result: "hello"
insert(4, "!"); // userB command
// result: "hell!o"
This is where OT comes in. The algorithm takes the command insert(4, 'l')
and transforms it (hence the name) to insert(5, 'l')
so the result is hello! regardless of the order of execution.
If the edit is out of date, we use the log of operations as a reference to determine what the user really intended. You can think of it as a realtime git-rebase.
NOTE: this example is taken from Martin Kleppmann talk
Problem with scaling the OT server
The big problem with OT is its dependence on a centralized server (ref).
Scaling this server is not trivial because a single instance must be used for all edits. Even Google has issues with it, which is probably why Google Docs sometimes show: "This document is overloaded so editing is disabled".
Alternative to OT – Conflict-Free Replicated Data Types (CRDT)
One alternative approach to Operational Transformation is Conflict-Free Replicated Data Types (CRDT).
A very simple differentiation between the two approaches is that OT attempts to transform index positions to ensure convergence (all clients end up with the same content), while CRDTs use mathematical models that usually do not involve index transformations.
Let’s again take the word helo
. Rather than using a position of characters, this time each letter will get a unique identifier which consists of a number and a letter:
UserA | UserB |
---|---|
h e l o |
h e l o |
0a 1a 2a 3a |
0a 1a 2a 3a |
As in previous example, userA changes the text from helo
to hello
and userB adds an exclamation mark at the end:
UserA | UserB |
---|---|
h e l l o |
h e l o ! |
0a 1a 2a 4a 3a |
0a 1a 2a 3a 4b |
You can see that new letters got assigned with 4a
and 4b
ids. A number 4
is calculated as one greater than the maximum at the moment of making the change and a letter represents a client identifier (a
and b
).
Based on these identifiers we can always sync letters correctly by following two simple rules:
- insert new element to a list if its
id
is less thanid
of a currently iterated element AND - skip over any existing list elements with greater id. This rule is important in order to resolve inserts on to the same position (for more info about it check this example)
No need for a server
An important difference from OT: Since no operation is transformed, there is no need for a server between clients. Data can even be syncronized peer-to-peer as well as encrypted (a deal breaker in some use cases).
Document size
One downside of this algorithm is size. Because of the metadata required for tracking identifiers, some early implementations of CRDTs required up to 80MB to represent a 100KB document on disk.
As libraries evolved, the size requirements have become more manageable. It’s now possible to use just a 1.5x – 2x size overhead compared to the contents themselves (100KB document in 160KB on disk, or 3MB in memory). In you are interested, you can read more about Yjs CRDT here.
Getting started with Yjs
For our use-case, we chose a CRDT approach by using an open-source library called Yjs.
Yjs works by defining a "shared type" (like Map
or Array
) whose changes are "automatically distributed to other peers and merged without merge conflicts".
We first create Y.Doc document:
import * as Y from "yjs";
const ydoc = new Y.Doc();
Then, choose a "shared type". Most likely, you will want Y.Array
, Y.Map
or Y.Text
which can be created using getArray
, getMap
or getText
methods of a YDoc:
const users = ydoc.getMap("users");
Similar to a native JavaScript Map
object, a YDoc shared Map
has methods like: get
, set
, delete
, values
, etc. In this example we are creating a list of users as a dictionary with userId
as its key:
const userId = Math.random().toString();
users.set(userId, {
username: "johndoe",
});
Unlike the native JavaScript Map
object, YMap also includes: observe
, unobserve
, observeDeep
. These methods can be used for reacting to changes from other clients. For example, React.js applications can use the methods to update the state of a component. To keep it simple, we will log all users whenever the map changes:
users.observe((yEvent, transaction) => {
users.forEach((user) => {
console.log(user.username);
});
});
For this to be collaborative, we also need a "yjs provider" to manage communication between clients. In this example, we use y-websocket which contains a simple WebSocket backend and a client:
1. Install y-websocket
npm i y-websocket
2. Run the server
PORT=1234 node ./node\_modules/y-websocket/bin/server.js
3. Create websocket provider
import { WebsocketProvider } from 'y-websocket'
const websocketProvider = new WebsocketProvider('ws://localhost:1234', 'yjs-demo', ydoc)
Stoplight Use case
When implementing APIs on a large scale, it’s best to employ a design-first approach where dedicated API architects write a specification and all the stakeholders, from developers to product managers and external consumers, are involved in the process.
This need to allow multiple contributors to collaborate on the designs is the motivation behind Stoplight’s Real-Time Projects.
With some help from Yjs, it is now possible for multiple people to work on the same document by using the code editor or Stoplight Studio’s form view. It is also possible to see who is online and where in the project are they located. See this video recording for a demonstration.
Conclusion
Today, building real-time collaboration applications is much easier than before. Not only does it free you from SAAS services lock-in, but it also opens a door to better and more secure applications.
Cover hoto by Josh Calabrese on Unsplash