I know it’s not May 5th yet. But, I don’t have enough Star Wars geek cred to pull off a “May the fourth be with you!” related pun. So, please bear with me.

Previously, I mentioned how the two magical words, “Operational Transforms,” a.k.a. “OT”, were the solution to all of my real-time syncing woes. So, what is OT? Before answering that, let’s first take a look at the problem we face with syncing.

Consider the scenario where two people, A and B, are simultaneously editing a simple document containing the text “hello!” initially.

  • The initial state of the document contains the text “hello!” at Version 0.
  • Person A makes an edit replacing the ‘h’ with a capital ‘H’ so that the document now says “Hello!” at their Version 1.
  • Person B makes an edit inserting the text “ world” before the ‘!’ to make it say “hello world!” at their  Version 1.

Now each person has arrived at a different state of the document, and we want to be able to merge their changes together to converge the divergent states back to a version of the document that contains both edits and says “Hello world!”

There is nothing new about this. We deal with this almost everyday in today’s workplace whether you are a developer merging changes in Git, or a product manager merging different versions of a Powerpoint slideshow, or a lawyer merging suggestions in a contract using Track Changes, or simply keeping your Dropbox / Google Drive in sync with your local folder. And regardless of what scenario you are dealing with, the solution is pretty much the same.

Diff, Sync, Merge.

You diff your version of the document, as well as the version that you want to sync with, against a common ancestor version from history to determine what exactly changed, and then determine how to merge both your changes. Finally, you sync the merged document back to the repository.

The Problem with Diff/Sync/Merge

Diff Sync Merge (aka Diff Merge) is a solid and proven way to perform merges, but it comes with one huge caveat. It needs multiple versions of the entire model to be retrieved/synced for performing sync and merge operations. Whether it was just a single character that was replaced, or the entire document, you still need access to the entire document to perform the diffs. This works well for use cases where you only have to sync relatively rarely.

But, if you want to build a real-time collaborative application a la Google Docs, the latency involved in pushing/retrieving entire document models back and forth on each individual edit would make it unusable. We need to rethink syncing and merging. We need a way to sync changes without syncing entire documents.

OT to the Rescue!

The basic premise behind operation transform is quite simple, and can be explained with the following diamond.

image

You start at an initial model state m0. Client A performs an Operation A and arrives at model state mA. Client B simultaneously performs Operation B to arrive at model state mB. Now, given the pair of operations A and B, we need to find complimentary operations A’ and B’ such that mB + A’, and mA + B’ both converge back to a single model that both clients can sync back to. For example, Client A’s operation of inserting ‘!’ at the 5th index-0 position get’s transformed to inserting ‘!’ at the 4th index-0 position, after B’s operation of deleting the first character is performed. And B’s operation undergoes no transformation because A’s operation doesn’t affect it’s index.

The important thing to note here is that regardless of what the model was, those two operations will lead to the same exact transformed operation. For instance, if the initial model state was “namaste”, and we performed the same operations as above. We would arrive at the same synced state of “amas!te”.

If we can define transformations A’ and B’ for every possible pair of operations A and B, then we can achieve our goal of syncing changes without syncing documents.

Of course, figuring out the useful transformations for all the different combinations of A and B could get quite complicated, but once they’ve been figured out, OT will guarantee that the clients will arrive at a synced state.

I emphasize “useful” because one could easily define A’ and B’ to simply delete the entire model for any and all combinations of A and B. In that case, your clients would technically arrive at a synced state and satisfy OT’s requirements. But it would be an empty document, and that does nobody any good :/

So, let’s take our initial example of taking the string “hello!”, and the two clients A and B, with A replacing ‘h’ with ‘H’, and B inserting “ world” before the exclamation mark. These state changes and all other text manipulation can be achieved via a minimal set of 2 operations:

1. deleteCharacter(atPosition p) aka DELETE (index)
2. insertCharacter(character c, atPosition p) aka INSERT (char, index)

This implies there are 4 possible pairs of inputs for our OT diamond.

  1. DELETE / DELETE
  2. DELETE / INSERT
  3. INSERT / DELETE
  4. INSERT / INSERT

And we could define our transformation logic as follows:

Operation A Operation B Transform A’ Transform B’
DELETE (indexA) DELETE (indexB) (indexB == indexA) ? NO_OP : (indexB < indexA) ? DELETE(indexA - 1) : DELETE(indexA) (indexB == indexA) ? NO_OP : (indexA < indexB) ? DELETE(indexB - 1) : DELETE(indexB)
INSERT (charA, indexA) DELETE (indexB) (index B < indexA) ? INSERT(charA, indexA - 1) : INSERT(charA, indexA) (index A <= indexB) ? DELETE(indexB + 1) : DELETE(indexB)</td>
DELETE (indexA) INSERT (charB, indexB) (index B <= indexA) ? DELETE(indexA + 1) : DELETE(indexA)</td> (index A < indexB) ? INSERT(charB, indexB - 1) : INSERT(charB, indexB)
INSERT (charA, indexA) INSERT (charB, indexB) (index A > indexB) ? INSERT(charA, indexA + 1) : INSERT(charA, indexA) (index B >= indexA) ? INSERT(charB, indexB + 1) : INSERT(charB, indexB)

There are a couple of interesting things to notice with the transforms we have defined.

  • First, we defined a new NO_OP operation to return for the case where both clients perform the same delete operation. This is to avoid double-deleting the wrong character.
  • And second, the >= in the last pair prioritize A’s inserts over B’s insert if they both insert characters at the same index. So if A inserted “ Bob” and B inserted “ World”, the output will always result in “ Bob World”.
    • Note: If both added “ World”, we would end up with “ World World”, but in a real-time collaborative environment, this is acceptable. If we really care about avoiding duplicates, we could define higher level operations that look at word inserts and compare whole words/phrases for duplicates.

This transformation model works well for single character operation. But how do we handle a string of operations by each client? For instance, replacing ‘h’ with 'H’ itself takes a minimum of two operations (a DELETE followed by an INSERT) and inserting “ world” requires 6 INSERT operations.

Handling Multiple Operations

Once you’ve figured out the transformation logic for a single pair of operations, handling multiple operations becomes a no-brainer. It’s simply the same OT diamond cascading transformed operations as input to the next OT diamond.

Take the following example where A diverges by 3 operations, while B diverges by 2. To compute the transformed operations A1′, A2′, and A3′ to apply on client B, as well as B1′ and B2′ to apply on client A, you simply have to cascade your transformations down. Note that there are intermediate states and intermediate operations shown in the diagram below, but there is no need to store or even care about the intermediate models as OT works independent of model state as shown above with that really lame “amas!te” example :)

image

In the above case, we had to perform transformations on 6 different OT diamonds. Basically, OT becomes a O(m*n) operation, where m & n are the number of operations the clients have diverged by. So, it is preferable to sync frequently keeping the count of operations to perform OT on pretty low.

Luckily, this suits really well for real-time collaborative use cases where we are syncing almost continuously after each and every edit.

Handling Multiple Clients

Syncing multiple clients is just as easy. We simply sync them sequentially and additively. Assume that you have 3 clients A, B, and C that have each performed 3, 2, and 2 operations respectively. To bring them all in sync, we first sync A (operations A1, A2, and A3) and B (B1 and B2) to determine A’ (A1′, A2′, A3′) and B’ (B1′, B2′). Now both A + B’ and B + A’ lead to the same synced state mAB for both clients.

We can now take one of these equivalent paths (either A1, A2, A3, B1′, B2′ or B1, B2, A1′, A2′, A3′), and sync that with C to determine C1′ and C2′, as well as the transformed operations to apply on client C. Let’s take the first path A1, A2, A3, B1′, B2′ for illustrating how this works. The sync would work as shown in the following diagram.

image

It’s not shown clearly in the diagram, but C1′ and C2′ are operations that get applied on both A and B, which are currently both at state mAB, to bring all three clients to the same synced state mABC. And we can follow this same technique and extend it to as many clients as we need to sync.

That was just a brief intro to Operational Transformations which forms the basis for real-time collaborative tools such as Google Docs, Dropbox Paper, and Apple iWork. Hopefully, it made sense, and you got to learn something new.