Notetaking In Semantic Triples

Posted 2021-05-14

This is a continuation of my previous post, describing my notetaking system.

I keep a Zettelkasten using org-roam. Notes are connected using links, which are actually just regular org-mode links. Those links look like this behind the scenes: [[URL][link title]]. Thus, there is an implicit relation between two notes where, if a link to note B appears in note A, you could express that relationship as a subject-verb-object triple, like this:

<Note A> <links to> <Note B>

Which is fine, for most purposes. But what if the relation between the two notes is more specific? For instance, if I have a note for T. S. Eliot, the poet, and a note for his poem, “The Waste Land,” the relation is really more like this:

<T.S. Eliot> <wrote> <The Waste Land>

So how can one achieve this? The linked notetaking strategy of org-roam and family gets us most of the way there. My note ts-eliot.org looks like this:

#+title: T. S. Eliot
Wrote [[The Waste Land]]

Now all we have to do is to make the verb into a note.

#+title: wrote
#+roam_tags: verb

For when a writer writes a creative work.
Example: T.S. Eliot wrote The Waste Land

So now we can write:

#+title: T. S. Eliot

[[Wrote]] [[The Waste Land]]

And we can extend that format for multiple objects:

#+title: T. S. Eliot

[[Wrote]]:
 - [[The Waste Land]]
 - [[Four Quartets]]

This in itself isn’t very useful yet, but now we have a structure that we can parse. In a separate parsing script, I can now write: if a line begins with a link to a verb, and is followed by a link to another (non-verb) note, that constitues a triple, which is then parsed as <T.S. Eliot> <wrote> <The Waste Land>. If a line begins with a verb link and a colon, and it then followed by a list of links, that is then parsed into: <T.S. Eliot> <wrote> <The Waste Land>. <T.S. Eliot> <wrote> <Four Quartets>., i.e., two triples.

At this point, I can now add more metadata to the note, which will link out to the greater semantic web—to Wikidata, for instance. “The Waste Land” note can now contain its Wikidata entity identifier, like this:

#+title: The Waste Land
#+wikidata: https://www.wikidata.org/wiki/Q581458

And Wikidata maintains a huge amount of data about the poem: its first date of publication, its first line, and even its Project Gutenberg ID, from which you can derive the full text of the poem. For many texts, Wikidata also maintains Goodreads identifiers, as well, which allows one to then derive a number of opinions about the text, as well.

You can imagine that this streamlines many aspects of research. I can now use SPARQL queries to ask complex questions like:

How long, in words, was the average line of poetry written in 1922?
How many total books were written by H.G. Wells’s lovers?
What is the distribution of literary genres for the books written by T.S. Eliot’s friends?

I can also feed all my note triples into one of the many Linked Open Data visualization and manipulation platforms.

However, it need be said that this is a little kludgey: it’s hack on top of a hack. So really, for this to become viable at all, something like this should really be integrated into org-roam. Or become its own package. (BRB, learning Elisp.)

I welcome your comments and annotations in the Hypothes.is sidebar to the right. →