Wherefore FluidDB?

I’ll be giving a 5-minute ignite-style talk about FluidDB at BarCamp in Glasgow this Friday (18th June 2010). If you don’t know them, the Ignite Format is a challenging presentation format in which you present exactly 20 slides in 5 minutes — and they autoadvance, so there’s no wiggle room.

While I suspect almost every Ignite presentation feels like a challenge, FluidDB seems to me like a particularly hard thing to compress into five minutes. Perhaps perversely, I ended up deciding to present it many different ways, each in about 1 minute. I recorded this and sent it rounds the usual suspects within FluidDB and one of the comments that came back, from Nicholas Tollervey, was

You explain really well what FluidDB is (e.g. slide 3) but not why (as in the problem it’s solving or the compelling reason for using FluidDB).

And it’s true. In the draft presentation, I talk a lot about what FluidDB is and talk about lots of its interesting characteristics, but never really come out and make a clear statement about its raison d’être.

That’s because I’ve never really worked out exactly why I think FluidDB is important, though it seems obvious to me that it is.

This article is an attempt to feel a way towards an answer to the question.

Before the Wherefore, the What?

I’m assuming that anyone reading this blog probably already knows what FluidDB is, but in writing the presentation I came up with a new and succinct (if rather dense) way of describing it (the “slide 3” that Nicholas refers to in his comment). This is:

FluidDB is a single-instance, online, social data store based on tags with values.

It has a query language, a fine-grained permissions system and a RESTful, HTTP API.

Or, as the slide has it:

So the question is:

Why would I want one of those?

What’s it Like?

If you look across all the material that’s been published about FluidDB you find that we describe it in myriad different ways, more often than by analogy, and claim lots of quite diverse benefits and goals for it. The following is a relatively small subset of those:

  • It’s like del.icio.us on steroids (Del.icio.us lets you tag and share bookmarks and keep some private: FluidDB is like that but not just for bookmarks, and you can tags on your values, and you get a much more flexible permissions systems and a query language, and . . .)
  • It’s like a wiki for (typed) data (Anyone can write any “information” about anything to it. You don’t need permission. But the information can be typed (any types you like) and can be queried. And there are no edit wars because everyone can writes to his own space. And . . .)
  • It’s (like) a database without the rigidity. (Clearly, like a database, it stores data and has a query language. But unlike a traditional databases, it doesn’t need a schema, or columns, or typed columns, and anyone can write to it without destroying anything, and . . .
  • It’s (like) a universal metadata engine. (Applications tend to store primary data and sometimes a limited set of metadata too. But FluidDB is particularly suited to being the one-stop-shop for storing metadata — data about data, or indeed, data about anything (any thing). And often by storing the metadata in FluidDB, you gain the ability to query it together with other (meta)data about the same things, which enables queries that are otherwise impossible.
  • It’s (like) a free data API for any app. (If an application stores its data in FluidDB, we FluidDB provides users direct access to that data through the HTTP API, and through libraries, through lots of different programming languages.)
  • It’s Like Twitter for Data. (It makes it easy for people to inject tiny bits of information into the world and tie them to the right place so that the information can be seen, found, queried and used in context.)
  • It’s (like) a shared space for cross-application data. (If application data is stored, or mirrored, in FluidDB, combining data from those different applications about the same things becomes practical, where it wasn’t before. So as well as bringing together data about the same things from lots of different users, we can bring together data from lots of different applications about those same things.
  • It’s (like) a revolutionary way to shift data from online companies to users. While all objects in FluidDB are shared, all data is owned, and FluidDB has the potential to allow users to retain control of their own data rather than ceding it to the application providers while using whatever (suitably enabled) tools they like to edit, change, manage, remix different parts of that data.

I could go on . . .

But the question remains: why does the world need something like that? Why does it need FluidDB? Wherefore FluidDB?

Wherefore FluidDB? (Answer the Question!)

I think that the very core of why we need something like FluidDB is that we need a place that different people (and applications) can store their data about whatever they want and link it reliably to other people’s data about the same thing. I struggle to think of anything else that really fulfils this role at the moment.

As a concrete example, it useful to have a place where we can store information about the book The Hitchhiker’s Guide to the Galaxy, by Douglas Adams. It is useful that different people can record their personal ratings of it, or record the fact they’ve read it. It’s useful if booksellers can list their price and availability for different editions of it (and there are separate objects for different editions, too, so this is no problem). It’s useful that we can record hard data about it, and its author. And its useful that such data can be shared and queried.

Not all of the attributes I listed in the description of FluidDB are necessary for this (some are merely useful design choices), but most of them are. Let’s go through the description:

“FluidDB is a

  • single instance: Does FluidDB really need to be single-instance. Well, no, not strictly. In fact, there would be lots of advantages to there being multiple versions of the underlying database, and indeed of the data. If it really takes off, it will become too big and important for one company to control. And there are some major challenges with that. But it’s not obvious that we couldn’t in fact eventually have multiple platforms using and sharing the data (certainly the public data), have the data be more distributed, and so forth. But as a starting point, we only have any hope of realising this by starting with a single-instance.
  • online: The vision really makes no sense at all unless this platform is online. The whole idea is to make the data readable and writable from anywhere.
  • social: Social is also central. The whole idea is that different people can store their data and allow others to see it.

data store.

“It has a

  • query language: Well, the query language is clearly central as well. I suppose it doesn’t absolutely have to be a language, but we definitely need a query mechanism.
  • fine-grained permissions system: As I noted in a previous post, the permissions system is far more central than might be expected to FluidDB. And it’s crucial to the goal here as well. The permissions system allows us to choose which data we share in FluidDB, but also which applications (or other people) we want to allow to write data on our behalf.

and a

  • RESTful, HTTP API: Clearly, it is not essential that the API be HTTP-based, let alone that it be RESTful, though these are good choices. But the existence of an API is important, and the it is highly desirable that it’s one that is easy to use from any programming language. From that perspective, HTTP is at least an excellent choice.

Summing Up

FluidDB is important because it provides a shared, online storage space for all people and applications together with mechanisms that allow them to annotate shared objects corresponding to anything in the universe (or nothing) in a way that avoids conflict and encourages cooperation.

As the data in FluidDB grows, whole new classes of applications become possible, combining data from disparate people and applications in ways that are only limited by the controls we give people over their own parts of the data store.

That’s why we need FluidDB.

Afterword

It might be tempting to argue that all the power described above really comes from centralizing data, which FluidDB currently certainly does. And its clearly true that any datastore that brings together data that previously lived in different places is going to facilate things that much harder when they are apart. It would also be easy to object that the idea of a single, centralized, all-powerful database is the stuff more of nightmares than a glorious future.

While these criticisms have some validity, I think they ultimiately miss the point. What FluidDB really does is promote a new paradigm for disparate applications and users to tie data together through shared objects. It is principally performance reasons on today’s networks that means that in practice this has to be in a single (alebit distributed) data store. Really the power lies in the logical information architecture FluidDB defines, which could be made physical in very different ways in the future.

The other apparent hole in what I’ve described is that I don’t actually say how different users and applications are supposed to ensure that they use the same FluidDB object to store their data about the same thing so that the connections between them can be exploited. As is the repeated mantra (and even name) of this blog, the key to that is the special about tag, which has properties that facilitate this. It’s certainly true that today there are few is any agreed conventions for about tags, but the information architecture provides the underlying support necessary to make this work, and over time the conventions can and will grow, compete, and eventually become agreed and stable (or at least metastable).