Findka
How Git could help save the open web
22 October 2020

Home / Blog / Current

I think we need a protocol for storing and accessing an individual's data on the web. If it became widely adopted, open-source web applications would become more prevalent, and we'd be able to achieve the goals described in Protocols, Not Platforms.

Here's a simple proposal which I'll call GitCMS. It relies on Git, EDN, and Clojure Spec. It has similarities with the Solid project and IPFS, but GitCMS's scope is much smaller.

(For those unfamiliar with Clojure: EDN is basically an improved version of JSON, and Spec is a language for defining schemas.)

Core proposal

Storage and data model

Each user has at least one Git repo, in which each file contains a single document. The document is stored as an EDN map, optionally followed by a blank line and some text. For example, my repo might contain a blog post in the file /blog/some-blog-post:

{:type      :article
 :title     "Some Blog Post"
 :published #inst "2020-10-10"}

Lorem ipsum dolor sit amet...

When reading, this file would be deserialized to:

{:type      :article
 :title     "Some Blog Post"
 :published #inst "2020-10-10"
 :path      "/blog/some-blog-post"
 :text      "Lorem ipsum dolor sit amet..."}

:path is the primary key.

These documents could be used for any kind of data. Besides modeling content you've created, you could model other content you've interacted with. If you like a Tweet, you might model that like so:

{:path   "/events/<some uuid>"
 :type   :rating
 :url    "..."
 :rating :like
 :date   #inst "..."}

For large files like images or videos, or frequently changing files like collaborative documents, you could store just the metadata and a foreign key/URL pointing to the content.

The community could maintain a special repository that contains schemas, defined with Spec. For example:

(ns example.schemas
  (:require
    [clojure.spec.alpha :as s]))

(s/def :article/type #{:article})
(s/def ::title string?)
(s/def ::article
  (s/keys :req-un [:article/type
                   ::title
                   ...]))

In English: to be an article, a document must have a :type key set to :article, it must have a :title key set to a string, etc.

So if you want to make an application that publishes a user's blog posts, you would filter through their repo for any documents that match the spec for articles.

(If your app is written in Clojure, you can use the official Spec definitions directly, but that's not required. They're just a reference implementation. The schemas don't even have to be written with Spec, but that seems like a flexible, concise, unambiguous way to do it.)

Access

To make an application that reads GitCMS data, just ask the user for their repo's URL. If you need write access, ask the user for an SSH key (on Github, "Deploy key"). As a user, you can make additional repos for more granular access control (e.g. besides a public repo, you could have a private repo which requires an SSH key for read access).

For more convenient and finer-grained control, there could be applications which have full access to your repo and provide an OAuth API for other apps. These API layers could do more things too, like maintain an index of your data in order to provide efficient queries. The specifics don't need to be part of the GitCMS protocol.

Git also lends itself well to change data capture. Applications can poll (or use Git hooks) for new commits. Given a batch of new commits, you can easily see which documents have been created, updated or deleted.

Benefits

Do one thing and do it well

I'll use Jira as an example. Jira was not loved at my previous job. My project manager said it was popular mainly because it was the only issue tracker that did everything. (If you disagree, think of some other web app that does many things poorly).

With GitCMS, you can easily have multiple apps that operate on the same data, without API integrations. You could have one app for creating new issues and a separate app for displaying those issues. Poor features, instead of being tolerated, can be replaced with small, specialized apps.

This would also lower the barrier to entry for individual developers and startups: instead of having to make an app that's good enough to displace an incumbent, you can make a small app that only replaces one feature. And you can do this without plugging into (and thus, becoming dependent on) a commercial platform.

Open data

Information discovery is one of the most important problems of our time. If everyone used GitCMS to store data about content they like and don't like, then anyone could use it to build search engines, recommender systems and social networks.

Adoption

Getting people to use any new protocol is hard, but I think GitCMS has a real chance to succeed. Git is widely used and understood, and there's plenty of free hosting. Plus it's convenient for developers: you can perform CRUD operations with just a text editor. Even without widespread adoption, GitCMS would be a nice data storage format for many side projects, like static site generators. GitCMS will benefit from network effects, but it can get started without them.

As early adopters build side projects on GitCMS, the main driver for adoption can become the apps, not the protocol per se. "Come for the apps, stay for the protocol."

If GitCMS becomes popular among developers, it can spread to non-technical people with some extra work. At the least, we would need a hosting service designed for this purpose instead of GitHub et. al.

The choice of EDN instead of JSON will probably hinder adoption, except among Clojure programmers. However I think the advantages of EDN are significant, and switching to it late-stage would be difficult. So I'm confident that this indulgence is worth it. It'd be good to survey the state of EDN implementations and see if any could use some help.

I personally got chills when the idea for GitCMS coalesced in my mind, so I'll be using it for my future side projects (which I will no doubt promote as "X, built on GitCMS"). I'll also let Findka users expose their favorited articles in an RSS feed, which can then be auto-imported into GitCMS. If you want to use GitCMS, let me know (you can find contact info and social media profiles on my website). If others are interested, I could set up a mailing list.

View discussions on Reddit | Twitter

Subscribe to The Sample, a newsletter curated by machine learning.