It’s a model and it’s looking good

Posted on May 23rd, 2012 by Paul Stainthorp

Ever since the CLOCK meeting we had in Peterborough, I’ve been trying to describe how open linked bib data might open up new models of ‘cataloguing’, resource description, and (by extension) presentation of bibliographic information to a user of a discovery system.

I’ve found it quite difficult to articulate these ideas without resorting to vague hand gestures and gibberish. At the recent CLOCK hack days at the CARET offices in Cambridge, we finally managed to capture these models on paper [actually, we used Lucidchart]. Thanks to Ed Chamberlain and Trevor Jones for taking notes as we talked through the various models, and for Ed’s colleague @ppetej for acting as a sounding board and critical friend.

The diagrams describe cataloguing processes real and hypothetical. They use a kind of pseudo-scientific notation which I find helpful; feel free to ignore it if you don’t.

Also: a cop-out disclaimer: these are rough sketches not polished theses. Please feel free to jump in and criticise, tweak, suggest improvements. If you understand Linked Data, we’re really interested in your comments about how these models could be physically represented. We’re not trying to suggest that any one of these models has all the answers or could be a ‘just-plug-it-in’ replacement for current practice, and we don’t intend to write software as part of the CLOCK project that will make these a reality. But: somewhere in the middle, we think there might be ideas or threads that are worth tinkering with and following up.


The first diagram attempts to describe copy cataloguing as libraries currently understand it, and involves the transfer of MARC records between institutions. When someone catalogues a book or resource, they tend to copy an existing record from another database, alter it to their needs and use it as they see fit. The record of any changes made is lost. Over time, this convention results in many unconnected versions of a record. N.B.:

  • The ‘donor’ institution (X) has a certain reputation, which is why the ‘recipient’ institution X′ chooses to copy its records.
  • Cataloguers at recipient institutions add, delete or change individual data elements according to local practice, preference or prejudice, or to correct errors. R and R′ are now effectively different entities with no described relationship between them. There is no record of the properties of changes made; no concept of an ‘edit history’.
  • This diagram does not go so far as to include the role of the union catalogue (e.g. Copac, Newton) – where R, R′, R″, R‴, etc., are re-combined (munged was the word we used!) to prove a single, new, averaged record (which is itself just another version of R).

Cataloguing workflow diagram 1 of 3



In the second model, which we described variously (and possibly not entirely accurately) as wiki-ish, Github-ish, OpenLibrary-ish, and LibraryThing-ish, there is only one, shared/community version of a bibliographic record for a given work, out on the web somewhere. Various institutions/their discovery systems all agree to use this one record.

  • The record is changed incrementally, one constituent data element at a time. Probably only the most recent version of the record is viewable/queryable by users and applications, although an edit history may exist and so older versions of records may be recoverable.
  • Changes are made by editors who might be cataloguers-at-institutions-with-reputations… or might not be. We’ve assumed that in this model institutional reputation is far less important. (On the Internet no-one knows you’re a cataloguer.)
  • This model doesn’t necessarily have to exist along a single timeline (although that’s how it’s shown here) – code-repository-style branching and merging is conceivable.

Cataloguing workflow diagram 2 of 3


The third, final, and most speculative model is also the most complex and probably the most poorly defined, but I think the most interesting. It’s also very Linked Data.

In any resource description ‘ecosystem’, there will always be multiple versions of a description of an entity out there somewhere (see scenario #1), each providing some unique or particular value to a specific audience. Cataloguers may benefit from a workflow that allows them to view these multiple descriptions and choose the specific assertions from each description that are most relevent to their target audience. In this model:

  • The notion of a series of discrete, changeable ‘records’ largely disappears (Where to? But should it?), to be replaced by a whole mass of overlapping individual data assertions about different aspects of the entity, derived from all manner of different sources. Multiple assertions which are trying to say the same thing about an entity can co-exist.
  • Assertions have additional properties which define and qualify them.
  • No assertion is ever destroyed – though it may be awarded properties which render it superseded or deprecated. Relationships between assertions are maintained.
  • Assertions are assembled on-the-fly into any number of transient Record Representations (RR) which are not permanently stored (though could be cached) according to a set of criteria which we’ve called here a filter. A filter defines a ‘recipe’ for specific data assertions to be included or excluded in the Record Representation, and/or specifies preferences for assertions with particular properties. A discovery tool becomes a device to store filters, and to build Record Representations. Data assertions may be stored elsewhere – and distributed across multiple datastores.
  • Filters could be defined manually by a user, as a set of preferences within a discovery tool. For instance: a second year Chinese medical student at a particularly university could choose to see assertions in Mandarin, to prefer MeSH subject headings over Library of Congress, and to include notes, URLs and local physical holdings information relevant to the university they study at [added by cataloguers who work_at the same institution at which they work/study]…
  • …alternatively, filters could be defined more passively: using ‘clues’ from the user’s institutional context, geolocation, or profile on external social networks (“show me records like my friends see” or even “show me records like people with similar research interests as me see”) to build a personalised filter (leading to personalised Record Representations that no-one else sees).

Key questions: What’s the value added by this model over others? Are there any individual ideas from one model that could be applied to another, even if the model as a whole is too complex?

Cataloguing workflow diagram 3 of 3

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

14 Responses to “It’s a model and it’s looking good”

  1. Hey! I just wanted to ask if you ever have any issues
    with hackers? My last blog (wordpress) was hacked and
    I ended up losing several weeks of hard work due to no backup.
    Do you have any methods to stop hackers?

    Look into my blog … pawn shops las vegas 89123

  2. I wanted to thank you for this great read!! I absolutely
    loved every little bit of it. I’ve got you saved as a favorite to check out new things you post…

  3. seo monitor says:

    Hello, i feel that i saw you visited my web site so i came to go back the desire?.I’m attempting to find things to improve my website!I
    guess its good enough to use a few of your concepts!!

  4. Lastly, Supercell occasionally holds competitions on their Facebook page for which they award gems to the winners.
    SEPARATION OF CHURCH AND STATEAnd, of course, there’s that analytically tricky puzzle
    of the doctrine of “Separation of Church and State” to very simply demystify here, since it is one of the
    most misunderstood and abused subjects, on both
    sides, which was ever structurally primed for being as inherently, rationally
    indispensable, in its true meaning, as it is capable
    of falsifyingly and destructively cutting, again, either
    way. At the very least, read the description on the box and ask a store associate for their opinion.

  5. Shalin says:

    I use for modeling and wire-framing

  6. It’s awesome to pay a visit this site and reading
    the views of all mates on the topic of this paragraph, while I am also eager of getting know-how.

  7. It’s an awesome paragraph in support of all the web people; they will obtain benefit from it I am sure.

  8. Wonderful beat ! I would like to apprentice while you amend your site,
    how could i subscribe for a blog web site? The account helped me a acceptable deal.
    I had been a little bit acquainted of this your broadcast provided bright clear

  9. I every time spent my half an hour to read this blog’s articles or reviews everyday along with a cup of coffee.

  10. Hi! This is kind of off topic but I need some help from an
    established blog. Is it hard to set up your own blog?
    I’m not very techincal but I can figure things out pretty quick. I’m thinking about setting up my own but I’m not sure where to start. Do you have any ideas or suggestions? Many thanks

  11. [...] Current and hypothetical workflows for cataloguing using distributed sources of bibliographic data [...]

  12. [...] Current and hypothetical workflows for cataloguing using distributed sources of bibliographic data [...]

  13. Something that I’m interested in and I think is missing from the last diagram is that multiple cataloguers may make the same assertions about a thing, and potentially you could create a record representation based on the ‘most asserted’ properties?

  14. Chris Leach says:

    Well, it all seemed to make sense when we talked it through last Thursday!