One of Terry’s [1] mantras is that it’s easy to build structure on unstructured foundations, but very hard to build something unstructured from a structured base. Those familiar with the second law of thermodynamics or young children may question the second half of this, and those who have ever tried dry-stone walling or making something comprehensible from certain developers’ code may disagree with first half; but let’s try.
Our task today is to publish the following table, taken from the Wikipedia entry on Planets in our Solar System [2], in FluidDB.
Obviously, this is a reasonably structured textual table of data that contains a lot of useful information that we, as humans, can interpret.
Here it is in Miró, a data analysis package that my (other) company, Stochastic Solutions produces:
All I did was extract the data from Wikipedia and “standardized” it a bit, getting rid of hyperlinks and things, replacing ranges with central values, separating out metadata and so forth. (The apparent loss of precision is illusory; Miró is only showing two decimal places.)
There are many possible ways to structure this data in FluidDB, but let’s start by showing you what I did, and then we’ll discuss the rationale for the choices I made.
I spent some of last weekend baking in FluidDB integration to Miró, with quite pleasing results. (I used both the my fdb library as well as Ali Afshar’s fom to do this: thanks for fom Ali: it certainly helped.) This is the log of what I did in Miró: as you can see, the upload took about three and a half minutes, which is obviously longer than we’d like; but that will improve.
$ time miro -f scripts/planets-publish
This is Miro, version 0.3.59.
Copyright (c) Stochastic Solutions 2008-2010.
Seed: 1268155418
Logs started at 2010/03/09 17:23:38 host zero.
Logging to /miro/log/2010/03/09/session100.
[1]> load planets
planets.miro: 13 records; 13 (100%) selected.
[2]> def about (+ 'planet:' Name)
Defining field about (+ planet: Name) . . . done.
Field "about" (string) planet:Ceres planet:Venus nulls = 0
[3]> fdb publish 13
Creating FluidDB table with name "planets"
Creating field objects for table "planets"
11 10 9 8 7 6 5 4 3 2 1 0
0
Creating record objects for table "planets"
12 11 10 9 8 7 6 5 4 3 2 1 0
0
Completing table entry for Miro table planets.
Table planets complete; id=5ff094b0-72be-4786-82db-078e9fa38623.
5ff094b0-72be-4786-82db-078e9fa38623
Job completed after a total of 209.3469 seconds
Logs closed at 2010/03/09 17:27:07 host zero.
Logs written to /miro/log/2010/03/09/session100.
real 3m29.640s
user 0m0.596s
sys 0m0.364s
So what did it do? We can use the fdb command-line utility, available from github, to find out. (Remember, in what follows, that—unlike fluiddb itself—fdb uses a leading slash to indicate full paths; I could have omitted all the /miro/ prefixes here and got the same result, but included them for clarity.)
The command below asks fdb to show the various nominated attributes about the FluidDB object whose about tag is table:planets.
$ fdb show -q 'fluiddb/about = "table:planets"' \
/about \
/miro/class \
/miro/n-fields \
/miro/n-records \
/miro/first-record-about \
/miro/first-field-about \
/miro/has-about-links \
/miro/has-id-links \
/miro/has-record-numbers \
/miro/has-field-numbers \
/miro/consistent
Object 5ff094b0-72be-4786-82db-078e9fa38623:
/fluiddb/about = "table:planets"
/miro/class = "table"
/miro/n-fields = 12
/miro/n-records = 13
/miro/first-record-about = "planet:Mercury"
/miro/first-field-about = "field:Name in table:planets"
/miro/has-about-links = True
/miro/has-id-links = False
/miro/has-record-numbers = True
/miro/has-field-numbers = True
/miro/consistent = True
Note that the object ID returned (5ff094b0-72be-4786-82db-078e9fa38623) matches the one reported as the result of Miró’s publish operation. (We could have got the same result by saying
$ fdb show -i 5ff094b0-72be-4786-82db-078e9fa38623 ...
Obviously all that’s here is some metadata. Key points are:
The top-level table metadata is stored in the object whose about tag is table:planets. Here planets is the name of the dataset.
n-fields and n-records obviously store the number of fields and records in the dataset, respectively.
The next two entries are used for navigation and are pointers to linked lists—one for the fields and one for the records. In this case the linking is done on about tags, so the first record is stored in the object about planet:Mercury, and metadata about the first field is stored in the object whose about tag is field:Name in table:planets. (It’s worth noting that another application or user wanting to store a field called Name in a table called planets, and using the same conventions, could and absolutely should use the same object: since that user or application will inevitably have a different namespace, there will be no conflict but the collecting together if information on the same object will be useful.)
Note that the prefix planet is not the same as the table name (planets) and no convention is being proposed about any relation between table names and about prefixes. Indeed, there is no requirement that there should even be an about tag on the records, though (as discussed below) my very strong preference is that in most cases there should be. Obviously, about-tag linking can only be used if there is an about field on each record.
The next four items all describe what extra metadata is stored in the table (if any). I’ll discuss these later.
Finally, the consistent tag is a boolean that indicates whether the table is in a consistent state or not. When Miró created the table object (about="table:planets") the very next thing it did was to set consistent=False to indicate that the data is not in a consistent state. The last operation Miró performs in publishing the table is to set it to True.
So much for the table metadata. What about the data? Again, we can use fdb (or anything else that can talk to FluidDB) to look at it. Here it all is:
$ time fdb show -q "has miro/planets/Name" \
/about \
/miro/planets/Name \
/miro/planets/Category \
/miro/planets/EquatorialDiameter \
/miro/planets/Mass \
/miro/planets/OrbitalRadius \
/miro/planets/OrbitalPeriod \
/miro/planets/Inclination \
/miro/planets/OrbitalEccentricity \
/miro/planets/RotationPeriod \
/miro/planets/Moons \
/miro/planets/HasRings \
/miro/planets/Atmosphere \
/miro/planets/db-record-number \
/miro/planets/db-next-record-about
13 objects matched
Object 2994f561-8efe-4e13-9374-bf3f9436eac6:
/fluiddb/about = "planet:Jupiter"
/miro/planets/Name = "Jupiter"
/miro/planets/Category = "gas giant"
/miro/planets/EquatorialDiameter = 11.209
/miro/planets/Mass = 317.8
/miro/planets/OrbitalRadius = 5.2
/miro/planets/OrbitalPeriod = 11.86
/miro/planets/Inclination = 6.09
/miro/planets/OrbitalEccentricity = 0.048
/miro/planets/RotationPeriod = 0.41
/miro/planets/Moons = 49
/miro/planets/HasRings = True
/miro/planets/Atmosphere = "hydrogen and helium"
/miro/planets/db-record-number = 5
/miro/planets/db-next-record-about = "planet:Saturn"
Object c74807ca-8090-4f0b-a45c-404b3e31d6b0:
/fluiddb/about = "planet:Mercury"
/miro/planets/Name = "Mercury"
/miro/planets/Category = "terrestrial"
/miro/planets/EquatorialDiameter = 0.382
/miro/planets/Mass = 0.06
/miro/planets/OrbitalRadius = 0.39
/miro/planets/OrbitalPeriod = 0.24
/miro/planets/Inclination = 3.38
/miro/planets/OrbitalEccentricity = 0.206
/miro/planets/RotationPeriod = 58.64
/miro/planets/Moons = 0
/miro/planets/HasRings = False
/miro/planets/Atmosphere = "minimal"
/miro/planets/db-record-number = 1
/miro/planets/db-next-record-about = "planet:Venus"
Object 6c590ab3-fa20-486e-a6ce-026b0e1d3a8c:
/fluiddb/about = "planet:Eris"
/miro/planets/Name = "Eris"
/miro/planets/Category = "dwarf planet"
/miro/planets/EquatorialDiameter = 0.19
/miro/planets/Mass = 0.0025
/miro/planets/OrbitalRadius = 67.7
/miro/planets/OrbitalPeriod = 557.0
/miro/planets/Inclination = 44.19
/miro/planets/OrbitalEccentricity = 0.442
/miro/planets/RotationPeriod = 0.3
/miro/planets/Moons = 1
/miro/planets/HasRings
/miro/planets/Atmosphere
/miro/planets/db-record-number = 13
/miro/planets/db-next-record-about
Object bc910875-ef5c-4b6a-b75f-bb80eff7a4ca:
/fluiddb/about = "planet:Mars"
/miro/planets/Name = "Mars"
/miro/planets/Category = "terrestrial"
/miro/planets/EquatorialDiameter = 0.532
/miro/planets/Mass = 0.11
/miro/planets/OrbitalRadius = 1.52
/miro/planets/OrbitalPeriod = 1.88
/miro/planets/Inclination = 5.65
/miro/planets/OrbitalEccentricity = 0.093
/miro/planets/RotationPeriod = 1.03
/miro/planets/Moons = 2
/miro/planets/HasRings = False
/miro/planets/Atmosphere = "carbon dioxide and nitrogen"
/miro/planets/db-record-number = 4
/miro/planets/db-next-record-about = "planet:Jupiter"
Object 88969e67-4834-477d-8cc8-f819ffbad6dc:
/fluiddb/about = "planet:Pluto"
/miro/planets/Name = "Pluto"
/miro/planets/Category = "dwarf planet"
/miro/planets/EquatorialDiameter = 0.19
/miro/planets/Mass = 0.0022
/miro/planets/OrbitalRadius = 40.0
/miro/planets/OrbitalPeriod = 248.09
/miro/planets/Inclination = 17.14
/miro/planets/OrbitalEccentricity = 0.249
/miro/planets/RotationPeriod = -6.39
/miro/planets/Moons = 3
/miro/planets/HasRings = False
/miro/planets/Atmosphere = "temporary"
/miro/planets/db-record-number = 10
/miro/planets/db-next-record-about = "planet:Haumea"
Object 5af2d080-9098-4e9c-9f60-3b85eb06f6d5:
/fluiddb/about = "planet:Haumea"
/miro/planets/Name = "Haumea"
/miro/planets/Category = "dwarf planet"
/miro/planets/EquatorialDiameter = 3.7e-16
/miro/planets/Mass = 0.0007
/miro/planets/OrbitalRadius = 43.35
/miro/planets/OrbitalPeriod = 282.76
/miro/planets/Inclination = 28.19
/miro/planets/OrbitalEccentricity = 0.189
/miro/planets/RotationPeriod = 0.16
/miro/planets/Moons = 2
/miro/planets/HasRings
/miro/planets/Atmosphere
/miro/planets/db-record-number = 11
/miro/planets/db-next-record-about = "planet:Makemake"
Object 72144788-a59e-4819-a9c9-6b8577e2695b:
/fluiddb/about = "planet:Neptune"
/miro/planets/Name = "Neptune"
/miro/planets/Category = "gas giant"
/miro/planets/EquatorialDiameter = 3.883
/miro/planets/Mass = 17.2
/miro/planets/OrbitalRadius = 30.06
/miro/planets/OrbitalPeriod = 164.8
/miro/planets/Inclination = 6.43
/miro/planets/OrbitalEccentricity = 0.009
/miro/planets/RotationPeriod = 0.67
/miro/planets/Moons = 13
/miro/planets/HasRings = True
/miro/planets/Atmosphere = "hydrogen and helium"
/miro/planets/db-record-number = 8
/miro/planets/db-next-record-about = "planet:Ceres"
Object e9b022e6-c770-44ad-abaa-1a2cde9a3224:
/fluiddb/about = "planet:Uranus"
/miro/planets/Name = "Uranus"
/miro/planets/Category = "gas giant"
/miro/planets/EquatorialDiameter = 4.007
/miro/planets/Mass = 14.6
/miro/planets/OrbitalRadius = 19.22
/miro/planets/OrbitalPeriod = 84.01
/miro/planets/Inclination = 6.48
/miro/planets/OrbitalEccentricity = 0.047
/miro/planets/RotationPeriod = -0.72
/miro/planets/Moons = 27
/miro/planets/HasRings = True
/miro/planets/Atmosphere = "hydrogen and helium"
/miro/planets/db-record-number = 7
/miro/planets/db-next-record-about = "planet:Neptune"
Object 818d8f3f-f3a1-4b10-9dc8-11514e468283:
/fluiddb/about = "planet:Makemake"
/miro/planets/Name = "Makemake"
/miro/planets/Category = "dwarf planet"
/miro/planets/EquatorialDiameter = 0.12
/miro/planets/Mass = 0.0007
/miro/planets/OrbitalRadius = 45.8
/miro/planets/OrbitalPeriod = 309.88
/miro/planets/Inclination = 28.96
/miro/planets/OrbitalEccentricity = 0.159
/miro/planets/RotationPeriod
/miro/planets/Moons = 0
/miro/planets/HasRings
/miro/planets/Atmosphere
/miro/planets/db-record-number = 12
/miro/planets/db-next-record-about = "planet:Eris"
Object e06bea33-a000-4294-a7b2-d3245f1481ca:
/fluiddb/about = "planet:Saturn"
/miro/planets/Name = "Saturn"
/miro/planets/Category = "gas giant"
/miro/planets/EquatorialDiameter = 9.449
/miro/planets/Mass = 95.2
/miro/planets/OrbitalRadius = 9.54
/miro/planets/OrbitalPeriod = 29.46
/miro/planets/Inclination = 5.51
/miro/planets/OrbitalEccentricity = 0.054
/miro/planets/RotationPeriod = 0.43
/miro/planets/Moons = 52
/miro/planets/HasRings = True
/miro/planets/Atmosphere = "hydrogen and helium"
/miro/planets/db-record-number = 6
/miro/planets/db-next-record-about = "planet:Uranus"
Object c81ab53a-9f69-4d24-bc2e-98f08298298f:
/fluiddb/about = "planet:Ceres"
/miro/planets/Name = "Ceres"
/miro/planets/Category = "dwarf planet"
/miro/planets/EquatorialDiameter = 0.08
/miro/planets/Mass = 0.0002
/miro/planets/OrbitalRadius = 2.75
/miro/planets/OrbitalPeriod = 4.6
/miro/planets/Inclination = 10.59
/miro/planets/OrbitalEccentricity = 0.08
/miro/planets/RotationPeriod = 0.38
/miro/planets/Moons = 0
/miro/planets/HasRings = False
/miro/planets/Atmosphere = "none"
/miro/planets/db-record-number = 9
/miro/planets/db-next-record-about = "planet:Pluto"
Object 6d75ee02-c64e-45c6-b167-49083ab3739a:
/fluiddb/about = "planet:Venus"
/miro/planets/Name = "Venus"
/miro/planets/Category = "terrestrial"
/miro/planets/EquatorialDiameter = 0.949
/miro/planets/Mass = 0.82
/miro/planets/OrbitalRadius = 0.72
/miro/planets/OrbitalPeriod = 0.62
/miro/planets/Inclination = 3.86
/miro/planets/OrbitalEccentricity = 0.007
/miro/planets/RotationPeriod = -243.02
/miro/planets/Moons = 0
/miro/planets/HasRings = False
/miro/planets/Atmosphere = "carbon dioxide and nitrogen"
/miro/planets/db-record-number = 2
/miro/planets/db-next-record-about = "planet:Earth"
Object 87551a22-5ef9-49e0-9d74-99f2d4af6776:
/fluiddb/about = "planet:Earth"
/miro/planets/Name = "Earth"
/miro/planets/Category = "terrestrial"
/miro/planets/EquatorialDiameter = 1.0
/miro/planets/Mass = 1.0
/miro/planets/OrbitalRadius = 1.0
/miro/planets/OrbitalPeriod = 1.0
/miro/planets/Inclination = 7.25
/miro/planets/OrbitalEccentricity = 0.017
/miro/planets/RotationPeriod = 1.0
/miro/planets/Moons = 1
/miro/planets/HasRings = False
/miro/planets/Atmosphere = "nitrogen and oxygen"
/miro/planets/db-record-number = 3
/miro/planets/db-next-record-about = "planet:Mars"
real 1m8.985s
user 0m0.313s
sys 0m0.346s
OK, so that’s too much data for a blog post, but since it’s a first I wanted to include it all.
There are two points to note here.
The convention I’m using (and proposing) is that miro (the FluidDB user) uses a separate namespace for each of its tables, and that the name of the namespace is the same as the name of the table, in this case planets. That namespace includes a tag for each field, and the name of the tag is the name of the field, i.e. the field Moons in the table planets is stored by Miró as values of the tag miro/planets/Moons.
In this case, there are two extra tags in the same namespace, both containing structure/navigation information. These tags have deliberately been chose to include hyphens and a db- prefix to try to minimize the likelihood of clashes with actual field names. As you have guessed, db-record-number is the record number (starting from 1), and db-next-record-about is a link to the about field of the next record in sequence. (The last record has this set to null.)
Clearly this is a deliberately redundant representation.
So we’ve seen the overall table metadata, and we’ve seen the data. There’s just one more thing, and that’s the field metadata. Again, we can use fdb to look at this:
$ fdb show -q 'miro/class = "field" and miro/table-name = "planets"' /about /miro/field-name /miro/type /miro/class /miro/table-name /miro/description /miro/next-field-about /miro/field-number12 objects matchedObject 27341750-121c-417d-a229-ed9112e4e182: /fluiddb/about = "field:Name in table:planets"
/miro/field-name = "Name"
/miro/type = "string"
/miro/class = "field"
/miro/table-name = "planets"
/miro/description = "Name of planet"
/miro/next-field-about = "field:Category in table:planets"
/miro/field-number = 1
Object 84e64f2a-ce44-4e72-8fab-fade19250034:
/fluiddb/about = "field:EquatorialDiameter in table:planets"
/miro/field-name = "EquatorialDiameter"
/miro/type = "float"
/miro/class = "field"
/miro/table-name = "planets"
/miro/description = "Diameter at the equator in units of the Earth's equatorial diameter, which is about 12,756.2km, or about 7,926.34 miles."
/miro/next-field-about = "field:Mass in table:planets"
/miro/field-number = 3
Object c153ce50-2216-49aa-8238-41e910e3ac4a:
/fluiddb/about = "field:Category in table:planets"
/miro/field-name = "Category"
/miro/type = "string"
/miro/class = "field"
/miro/table-name = "planets"
/miro/description = "Kind of planet. One of terrestrial, gas giant or dwarf planet"
/miro/next-field-about = "field:EquatorialDiameter in table:planets"
/miro/field-number = 2
Object 060881dc-5f4f-481a-9f12-699ac7d50dbd:
/fluiddb/about = "field:OrbitalRadius in table:planets"
/miro/field-name = "OrbitalRadius"
/miro/type = "float"
/miro/class = "field"
/miro/table-name = "planets"
/miro/description = "Orbital radius in astronomical units. The Earth's mean distance from the sun is defined as 1AU, and is about 150 million km or 93 million miles. In the case of the dwarf planets, ranges have been converted to approximate central values."
/miro/next-field-about = "field:OrbitalPeriod in table:planets"
/miro/field-number = 5
Object fc8fc84f-5ab0-4e2b-942c-f653de08ac74:
/fluiddb/about = "field:OrbitalEccentricity in table:planets"
/miro/field-name = "OrbitalEccentricity"
/miro/type = "float"
/miro/class = "field"
/miro/table-name = "planets"
/miro/description = "The eccentricity of the planet's orbit."
/miro/next-field-about = "field:RotationPeriod in table:planets"
/miro/field-number = 8
Object 7ae04a54-e104-4a20-88d1-22bcfeacfdb9:
/fluiddb/about = "field:Inclination in table:planets"
/miro/field-name = "Inclination"
/miro/type = "float"
/miro/class = "field"
/miro/table-name = "planets"
/miro/description = "Incliniation to the Sun's equator in degrees."
/miro/next-field-about = "field:OrbitalEccentricity in table:planets"
/miro/field-number = 7
Object 5081c010-d99f-4e95-97ae-1bedc5b22a97:
/fluiddb/about = "field:HasRings in table:planets"
/miro/field-name = "HasRings"
/miro/type = "bool"
/miro/class = "field"
/miro/table-name = "planets"
/miro/description = "True if the planet has rings."
/miro/next-field-about = "field:Atmosphere in table:planets"
/miro/field-number = 11
Object 1a653242-7049-468d-816d-e41d88deb491:
/fluiddb/about = "field:OrbitalPeriod in table:planets"
/miro/field-name = "OrbitalPeriod"
/miro/type = "float"
/miro/class = "field"
/miro/table-name = "planets"
/miro/description = "Orbital period in (Earth) years."
/miro/next-field-about = "field:Inclination in table:planets"
/miro/field-number = 6
Object c4fae320-e9eb-4905-8c8e-9857dafecc34:
/fluiddb/about = "field:RotationPeriod in table:planets"
/miro/field-name = "RotationPeriod"
/miro/type = "float"
/miro/class = "field"
/miro/table-name = "planets"
/miro/description = "The period of rotation in (Earth) days."
/miro/next-field-about = "field:Moons in table:planets"
/miro/field-number = 9
Object 5a665b6e-ea63-41a6-9f27-ff1ed4960288:
/fluiddb/about = "field:Moons in table:planets"
/miro/field-name = "Moons"
/miro/type = "int"
/miro/class = "field"
/miro/table-name = "planets"
/miro/description = "Number of (named) moons."
/miro/next-field-about = "field:HasRings in table:planets"
/miro/field-number = 10
Object f35e65bc-a595-48d4-9da7-3ccac02b0893:
/fluiddb/about = "field:Mass in table:planets"
/miro/field-name = "Mass"
/miro/type = "float"
/miro/class = "field"
/miro/table-name = "planets"
/miro/description = "Mass of planet in units of Earth Mass. The Earth's mass is roughly 5.9736 x 10^24 kg."
/miro/next-field-about = "field:OrbitalRadius in table:planets"
/miro/field-number = 4
Object baf7d11c-4821-4862-8af1-21c69b141dca:
/fluiddb/about = "field:Atmosphere in table:planets"
/miro/field-name = "Atmosphere"
/miro/type = "string"
/miro/class = "field"
/miro/table-name = "planets"
/miro/description = "Atmospheric composition (main constituents)."
/miro/next-field-about
/miro/field-number = 12
So:
The fields have names (tagged field-name) and those names (as we saw above) are also used as tags in a namespace whose name is the table name (e.g. miro/planets/Moons).
The fields also have types (tagged type). Miró is currently supporting five types for fields, four of which are base FluidDB types
- bool
- int
- float
- string
The fifth supported type (not included in this table) is
- revdate
which is a string exemplified by 2010/03/09 22:22:22. There is no timezone information and the format must be exactly as shown (i.e. %Y/%m/%d %H:%M:%S). No doubt others will be needed.
NOTE:
- In FluidDB values have types but tags do not: different objects may be tagged with values of different types for the same tag. The schema being used here prohibits that: in effect Miró is making a guarantee that FluidDB does not.
- Nulls are allowed in all fields. While in FluidDB, null is considered to be a type, in the schema used here, null is simply the absence of a value; and all values can be missing.
There is a separate object for each field, and the object has an about tag of the form "field:Name of table:tablename".
Fields can optionally have a description. In this case, I’ve includeded various information in that including units; I will probably add a separate units tag later.
There’s much that could be discussed about this schema, but this has been a long post so I’ll restrict myself to a few words now.
For anyone interested, I’ve also published the elements from the periodic table using identical schema conventions. I’ll blog about that in a separate post, but if you want to look, the key information is:
$ fdb show -a "table:elements" /id
Object with about="table:elements":
/id = "b898b875-89ad-4b69-bf89-b6a605b79697"
$ fdb count -q 'miro/class = "record" and has miro/elements/Z' /about
118 objects matched
Total: 118 objects
[1] | That’s Terry Jones, @terrycojones, the founder of Fluidinfo, and the most fluid of all of us. |
[2] | http://en.wikipedia.org/wiki/Planets#Solar_System as at 7th March 2010. |