Part 1 | Part 2 | Part 3 | Part 4 |

Table of Contents


The previous blog post in this multi-part series about Yjs, the real-time collaboration framework, dealt with awareness therein, and how encoding deletions in the Yjs way can yield substantial performance dividends. Recently, during its assessment of a variety of tools for collaborative editing, Tag1 Consulting opted for Yjs and ProseMirror for an ambitious shared editing project at a well-known Fortune 50 company.

Not long ago, yours truly (Preston So, Editor in Chief at Tag1 and author of Decoupled Drupal in Practice) sat down with Kevin Jahns (Real-Time Collaboration Systems Lead at Tag1 and creator of Yjs), Fabian Franz (Senior Technical Architect and Performance Lead at Tag1), and Michael Meyers (Managing Editor at Tag1) for another Tag1 Team Talks webinar and podcast spelunking into Yjs and the features that make it a compelling choice for a wide variety of shared editing use cases. In this new installment of the Yjs blog series, we conclude our discussion of awareness and devote our attention to offline editing and versioning, two critical elements of real-time collaboration.

Versioning in Yjs

Before you move forward in this blog post, I strongly advise readers unfamiliar with some of the background around Yjs to read the first, second, and third installments of our Yjs deep dive blog series before proceeding, as those posts introduce foundational concepts necessary to acquire the understanding of Yjs necessary for the content that follows.

How deletes are exchanged

As we witnessed in the previous installment of this series, delete operations are one of the most important ways in which Yjs handles performant peer-to-peer collaboration. As we mentioned previously, a version is nothing more than a state vector that represents operations in an efficient manner.

State vectors are merely JavaScript maps. Consider a user with an identifier 0 who has performed a single operation. At the same time, another user, user 1, performs six operations. This can be defined in a state vector, or JavaScript map, as follows:

{
  0: 1,
  1: 6
}

In this scenario, the clock, which is the value associated with the keys (the user identifiers) in this map, represents the number of operations received by the client. For instance, after user 0’s operation, the next expected clock from user 0 would increment from 0 to 1. This is how Yjs maintains awareness of how many structs are present and which structs we expect from other users. In turn, this allows us to sync messages across users and discover what data is absent from certain users.

The state vector above encodes the number of structs present at particular points in time. But the next thing we need to uncover is which characters — and which structs — have been deleted. This is where the delete set covered in the previous installment of this series comes in handy. The resulting versioning vector consists of the state vector and the number of deletes that need to be performed to sync the local document.

Distributed revision histories

For those who are accustomed to traditional revisioning paradigms, these concepts may be unfamiliar and confusing. As an example, a typical revisions model in the content management world consists of numbered versions that increment by 1. However, in a peer-to-peer context, where revisions can collide and be arbitrarily assigned incremental identifiers as users come online and go offline, this revisioning model falls short.

To illustrate this, consider a scenario where User A has a version number 1,000. If User B remains offline during User A’s newest revisions, their newest version will also have the number 1,000, which results in a conflict. However, suppose that User A has version number 1,000 available. In the context of a distributed revision history, this means that User A could be on version 600, while User B is on version 400.

Together, User A and User B both have access to the cumulative sum of all versions and changes to the document, once their clients have synchronized with the new changes. All states of the document remain unique, which means that if User A’s changes occurred simultaneously with User B’s, in the canonical revision history they will be interspersed with each other.

Tracking changes in Yjs

As we can see from these examples illustrating versioning in Yjs, expressing state in real-time collaboration solutions can be difficult. The most important prerogative of shared editing solutions when it comes to versioning is the ability to show a state snapshot at a particular moment in time. In Yjs, we use state vectors that refer to structs as well as deletion sets that represent which structs are deleted at a certain point in time.

Representing states in distributed systems can be a particularly difficult problem, which brings us to tracking changes and attributing them to users. We can certainly create a snapshot of the content at a certain time, but the more important concern is to identify exactly who made the changes. We know this information thanks to the user identifiers included with each operation, but we also need information about deletions as well.

Attributing and identifying changes

With a linear history of revisions, we can easily track and attribute all changes. But in a peer-to-peer context, we lack the luxury of a centralized server that tracks all changes. Consider, for instance, the delete operations we defined in the previous installment of this series:

[1, 0, 2]
[1, 2, 1]

We can now assign all of these deletions to a specific user given the first member of the array, which identifies the user responsible for the operation. However, consider a situation in which a third user, user 3 deletes operations from user 1. Because all delete operations are additive to the list of deletions and all deletions are maintained in the version history, we can know not only who inserted characters but also who deleted them as well.

This means that at any time with any version that we access, we can track who inserted characters where and who deleted them where. And armed with that information, we can compute tracked changes, attribute those changes, and access the same features that are familiar to users of Google Docs or developers who frequently execute git blame in their codebases. This can result in a Google Docs-like interface to diff two versions in the history.

Offline editing

Collaborative editing seldom happens in ideal conditions. Offline editing, for instance, often occurs out of necessity in unfavorable situations, including on flights without Wi-Fi enabled. Consider, for example, a scenario in which Amina is on a transatlantic flight where she lacks access to a network. After landing at her destination, she begins to sync with Birgit, who has the previous state from before Amina’s edits.

Fortunately for Amina and Birgit, Yjs can compare the state from prior to the server sync and the state afterwards, properly ascertaining in the process the changes that transpired. And with Yjs, it is possible to check changes for possible conflicts before submitting them to a server. Birgit, for example, will surely want to be aware of what changes Amina made while airborne and whether they conflict with any changes Birgit made in the intervening hours.

Peer-to-peer versioning

The story of Amina and Birgit’s collaboration also illuminates the notion of automatic merging. Google Docs, as an example, performs automatic merging when documents have been edited simultaneously in different locations without awareness of one another. But Birgit may not have notified Amina that she was also making changes and may wish to ensure that her changes are compatible with Amina’s in-flight modifications.

This highlights an important consideration for editorial governance and the human dimension of collaboration. Birgit needs a much more granular awareness of what has happened in the document than an automatic merge would permit. Many other approaches simply allow restoration of a previous state, but this can be dangerous in a peer-to-peer context where revisions can occur nonlinearly and without a server. Instead, because Amina and Birgit work on closely related tasks, Birgit wants to validate each revision and view what incremental changes Amina made in the air.

This hypothetical scenario illustrates plainly the importance of the human element in conflict resolution in collaborative applications; our tools are ill-equipped to understand how to merge potentially competing and controversial changes. There are some things that humans simply still do better.

Conclusion

Change tracking and offline editing are both relatively new concepts in the peer-to-peer space, and Yjs creator Kevin Jahns acknowledges that there aren’t many foregoing approaches to this problem that succeed in the peer-to-peer space. Luckily, the landscape is evolving quickly, with inspiration around every corner. Google Docs and Dropbox Paper both are beginning to provide nameable versions and other means of customizing versioning processes. But this doesn’t solve for other shared editing use cases such as collaborative drawing or collaborative 3-D modeling.

For his part, Kevin believes that offline-capable applications are only just starting to emerge, and many more will surface in the short and medium term. All of them will share the important prerogative of revealing to the user what happened while they were offline or away from the document. For this, custom versions and custom tracked changes on a per-user basis may be the only option. Choosing whether to see a summary of all changes or a series of incremental modifications should be an essential facet of any shared editing solution. With Yjs, this bright-eyed future, which we discussed in our recent Tag1 Team Talks episode, may soon be more readily available than we expect.

Special thanks to Fabian Franz, Kevin Jahns, and Michael Meyers for their feedback during the writing process.

For more Yjs content, see Yjs - Add real-time collaboration to any application.

Part 1 | Part 2 | Part 3 | Part 4 |


Photo by Sapan Patel on Unsplash