Posts Tagged ‘IoE’

CLOCK and a summary of 2 other Discovery projects

Posted on May 17th, 2012 by Paul Stainthorp

Ed Chamberlain, who is on the CLOCK project team as a researcher, is involved in two other projects under the Discovery strand: OEM-UK and Open Bibliography 2. We’re looking for ways in which CLOCK can re-use data, code, processes and ideas from these projects (and elsewhere) – also what CLOCK could offer in return.


  • Open Biblio project over the last few years; aim to aggregate large amounts of bibliographic data for scientific discovery.
  • Data collected from Cambridge University, the BL, PubMed and held as RDF, used to power an open catalogue called “Bibliographica“.
  • Problems around scaling the data/system led to the current JISC-funded Open Biblio 2project (in the meantime, Cambridge and the BL had started to publish their data openly).
  • Open Biblio 2 started looking at a NoSQL approach (CouchDB, Lucene/Solr) – eventually settling on Elastic Search.
  • The approach of Open Biblio is to build bottom-up, community tools: BibServer and BibSoup(“Like Wikimedia for bib data”). Raises interesting questions about data quality in an open community-driven system.
  • Also looking at JSON as lightweight way of sharing bib data: emerging BibJSON convention for representing bibliographic record as a JSON object (Ed wrote a MARC-to-BibJSON-parser in Perl). N.B. BibJSON is not a million miles away from the JSON that Jerome spits out! There are three hack days taking place next month in London to look specifically at BibJSON.
  • Open Biblio 2 is also looking at JSON-LD (JSON for Linking Data), a ‘real’ JSON standard which does a lot of the things that RDF does.

tl;dr = use their JSON standards and BibSoup as a data source.

  • The second project, OEM-UK (Open Education Metadata UK), based at the IoE in London, is focusing on cataloguing workflows.
  • Data from the IoE’s SirsiDynix catalogue, plus EPrints is drawn into a Drupal framework; forms to create data (autopopulation of forms); “cataloguing the Drupal way”.
  • Thought from Andrew Beeken: could we replicate this approach, using WordPress custom post types to store and display structured content? Shades of the OPACPressproject which Joss Winn and I proposed—but that was not funded—several years ago.
  • Some evidence that this approach is capable of speeding up the cataloguing process considerably: the more data you put in the faster it gets! Ed has some screencapture videos from OEM-UK showing workflow, including grabbing data via Zotero.
  • td;dr = OEM-UK are also successfully disrupting cataloguing workflows.