This is the end (again) – “final” project blog post

Posted on July 31st, 2012 by Paul Stainthorp

Here is a summary of the work undertaken as part of the CLOCK project, February-July 2012. As in previous projects, I don’t expect that this will really be the end of this work. We intend to carry on developing tools for publishing, consuming, and playing with Open Bibliographic Data at the University of Lincoln (via LNCD or elsewhere), and I expect us to take CLOCK ideas, data and code further over the coming months.

Thanks to all the people who have contributed to the CLOCK project.

1. Outputs: what has the project produced?

Our major tangible output is a library of code for interrogating and working with multiple, distributed sources of open bib data.

  1. The PHP source code for all of the prototypes produced by the CLOCK project is available from a GitHub repository - https://github.com/lncd/clock. This is an open, public, “live”, working version of the code which may well be developed and improved over time. (A ‘snapshot’ of the code as it existed on 27 July 2012 has been archived here.) The public code is available to re-use under a GNU Affero General Public Licence. All of the software prototypes have been detailed in the following posts:
  2. A series of modified cognitive interviews were conducted with several university library cataloguers at Cambridge University Library and the University of Lincoln. The interviews have been summarised here and here. We captured narrated screencapture videos for each of the interviews: these are being held in a private repository and could be mined further investigated for similar projects.
  3. A number of further blog posts discussing:
  4. While it is not yet publicly accessible, we have taken significant steps toward the establishing a permanent data.lincoln.ac.uk service at the University of Lincoln. We expect that by late August 2012 this service will be operational and will provide a gateway to the entirety of the bibliographic data and enhancement/manipulation tools produced by CLOCK and its predecessor project (Jerome).

2. Lessons learned

CLOCK reinforced to us that the constraints of a six-month project (effectively 4½ months of development time) are difficult to reconcile with the needs and possibilities for constant experimentation and development around bib data. We also identified the following points:

  1. The CLOCK model (of presenting different versions of the same bibliographic element for recombination) is feasible and can be modelled in software—albeit with a limited number of sources—even in a limited time; it has potential as a real-world tool for use in libraries. A writeup of this approach can be found here: CLOCK – the localized index model
  2. Software developers and librarians need to be aware of, and realistic about, the limitations of particular bib data formats. For example, real-time querying of RDFcan be very resource intensive. Be clear about your needs up front. Not every shiny open data format is right for every shiny open data project!
    • Related: synchronous querying of SPARQL endpoints is not the way forward! (We characterised this blind alley as “federated searching for the 21st century”…)
  3. We are a long way away from consistency of approach in the exchange of non-MARC bibliographic information. Every data source we approached in CLOCK required that we develop a different tactic from first principles to query, retrieve and manipulate/execute functions. Our blog posts on the idea of a ‘universal translator‘ explore this problem further.
  4. Available data is often poorly and confusingly documented. We were regularly frustrated merely in understanding what a data source contained and how it was structured. I (PS) have argued throughout that without some form of centralised registry/gateway (“data.ac.uk/library“) – whether it be managed through external curation or self-submission to a rigorous documentation standard, developers will waste time repeatedly having to interpret the same data again and again. We believe that a national bib data portal would be a great help to centrally index data sources and catalogue their respective schemas. We understand that not everyone agrees with this approach (“who pays?”) and we invite discussion of the alternatives!
  5. And a couple of practical ‘meta’ lessons about running agile software development projects in libraries:
    • Working with a number of developers based in different locations is hard. On a project like this it would be ideal to have a central location that could be used as a development hub. Other projects under the LNCD umbrella have found the same.
    • There is the risk of losing ground already made if the outputs of previous projects (e.g. Jerome) are not correctly maintained and curated. This can lead to significant delays when previous work is not available. More rigorous use of development tools—including GitHub and Orbital—is helping to mitigate against this in future.

3. Opportunities and possibilities

We have identified possible continuations and extensions of the CLOCK work; these are detailed further below.

  1. Continuation of development of the CLOCK software as a tool for (a) faster querying of distributed bib data sources through local indexing of key fields, (b) translating disparate data formats into a common translation standard, (c) meaningfully presenting distributed bib data to the user, (d) allowing a ‘cataloguer’ to select and recombine bibliographic elements to create a new record, which (e) feeds back into a new original data source, making the process iterative, and (f) incorporates social and reputational components in a user’s selection between alternative data elements for the same resource.
  2. Discussion of the business case to libraries. What value does an alternative resource description model offer, and what would libraries gain through relinquishing some of the control over institutionally-owned catalogue data? Related: could we quantify and qualify the time spent on particular cataloguing / discovery activities using traditional LMSs and demonstrate possible efficiencies or increased quality in a new, distributed model? What arguments for the incorporation of open bib data in cataloguing will convince library managers to replace current practice?
  3. A more thorough examination of all the application functions that a cataloguer relies upon through more extensive cognitive interviews and/or functional mapping processes with cataloguers at a range of institutions.
  4. Further investigation of best practice in documenting and describing our own published bib data in JSON/RDF. [Being picked up as part of the data.lincoln.ac.uk work, above]
  5. We also intend to submit for publication articles on a number of topics arising from CLOCK. Suitable journals and conferences have been identified and articles are being prepared on the following broad topics:
    • The approach of the CLOCK project in developing software for working with multiple, distributed sources of open bib data.
    • Expectations of truth and ‘trust’ in bibliographic data (“How has this assertation been derived about a work?”)
    • The potential of new models of resource description to save time and effort for libraries; techniques for analysing the efficiency of search and cataloguing workflows.

Note on the cognitive interview process

Posted on July 31st, 2012 by Trevor Jones

When performing usability research, participants are not always particularly forthcoming with absolute facts about an artefact in question. For instance, their preference for a device may be biased by the brand because it is popular, rather than for good usability features; whereas something as mundane as the appearance of the device may be cause for someone to criticise it though its usability is sound. A person with years of experience with a particular system may find little at fault having become normalised to its quirks – for them “if the system ain’t broke, don’t fix it”.

Providing participants with a questionnaire can leave you asking more questions than you started out with: Why did they answer in such a way? How did they interpret the question? What was their understanding of the question? Is the answer a true representation of the facts? How thorough is the questionnaire process? And many more…

Filming the participant performing a task is useful but lacks context. For instance: What is the point to the task? Why are they doing what they’re doing? What short-cut key was pressed, what does it do, and why was it used at this particular time? Is this the most efficient way of performing the task?

The cognitive interview, proposed by Fisher and Geiselman in 1992, is a technique used in psychology and has been adopted for use in police investigations to enhance information retrieval of eye witnesses. The process follows a line of questioning that takes the eye witness through the events of the crime from different perspectives – what they themselves witnessed, what they think other witnesses would have seen or even the criminals. The questioning may require the witness to recount the events in a different order, coming in at varying time points. The interviewer attempts to recreate the scene: time of day, sounds, weather, feelings… the witness is encouraged to recall every detail no matter how unimportant they might think it is.

We used a modified cognitive interview that attempts to record a participant’s activities with a task and that seeks to understand the activity, the reason behind the activity and the thought process and any accompanying information of interest that relates to the activity. Screen and audio capture software was used to record the participant’s on-screen activity whilst the participant voiced the purpose of the overall task, the sub actions that were performed to complete the task and any usability frustrations/issues encountered. Additionally, the participant was observed performing the task by an experienced usability analyst who asked non-leading questions relating to the activity.

We found this technique to be of great value for:

  • Identifying usability issues with the system.
  • Understanding how experienced users of the system perform their duties and the differences in how the duties are carried out by different users to perform the same actions.
  • Understanding workflows.
  • Understanding the process of data to inform the design of a new system.
  • Extracting information from users providing insights to otherwise hidden knowledge.
  • Recording information for further data mining.

Captured in Lincoln

Posted on July 30th, 2012 by Trevor Jones

Task analysis data was captured from two University of Lincoln Library Cataloguers using Horizon cataloguing software. The task: to input data for a range of publications – some with an ISBN and some without.

Once again the method employed was a ‘cognitive interview’ variation, recorded with screen capture and voiced running commentary of the activity undertaken and accompanying points-of-view about the user interface, processes, functionality, issues encountered and frustrations with the system.
Empirical observations were also undertaken with a further line of enquiry with the participants in relation to key points-of-note arising from the observations.

  • The Time Team may not have found this software too historic but perhaps still be worthy of some trench digging and carbon dating. Although software updates have improved the user interface, band aids have a limited amount of sticky.
  • Findings: the participants have approximately eight and ten years experience respectively with the software and are therefore normalised to its idiosyncrasies.
    • Entered text does not persist across key fields – cataloguer has to enter text several times for different screens.
    • The classic grey user interface provides a ‘vanilla’ boxy containment for displaying data. Navigation is via MS standard toolbar with additional custom buttons and a retro Windows hierarchical file system (NICE!) ;)
    • Navigation methods change for different screens – Editing items uses ‘next’ and ‘previous’ navigation buttons. Whereas, Marc record editing uses ‘page up’ and ‘page down’ buttons.
    • Once within the Marc editor, the cataloguer is unable to go back to the search screen to locate further information; a new search must be performed.
    • Whilst within a MARC record, there is no option to save the current state of the record – all edits are performed whilst the record is open.
    • Windows do not sit behind Horizon – they have to be minimised.
    • Unable to easily select and copy all text within a text field (some text fields are too small).
    • Redundant data – there are a number of tags that are apparently not required within the Marc record and time is taken to manually strip these out.
    • Macros – although we didn’t get to see this in action – the participants use their customised macros to perform many of the actions that would otherwise be monotonous and time consuming. Requires further investigation.
    • There is a mixture of similar usability issues with Lincoln’s software with that of Cambridge, with additional aspects inherent to each cataloguing system. In both cases the cataloguers have many years of experience with the software and the procedures involved in the decision making of how data is processed.

Both Lincoln and Cambridge cataloguing staff perform the same process which is to: create records or edit existing records; edit the information within the record, which may require additional resources to provide information to complete the record such as from Dewey, Citrix, Clarify, Library of Congress or the British Library, and to save the file – see diagram below.

Thanks to our Lincoln Cataloguer participants: Bev Jones and Jill Partridge.

CLOCK – the localized index model

Posted on July 27th, 2012 by Andrew Beeken
Local Index

The Local Index

Where could we go with CLOCK? That’s one of the questions we’re asking as we draw the project to a close. As mentioned in previous posts, our thoughts move towards a localized index. This would basically, rather than replicate all the information on the distributed databases, hold entries for each uniquely referenced bibilographic work. In a nutshell, it would contain rudimentary search (Title, ISBN, Author) content and specific location (URi) content. This would allow users to search for very specific records without having to span a search across multiple datasets, something our adventures in SPARQL proved to be quite tricky due to the weight of the data coming from these endpoints.

We’ve made a pretty diagram to show how this would work! (click on it for a bigger view)

Captured in Cambridge

Posted on July 27th, 2012 by Trevor Jones

Cambridge CLOCK hack – our next phase in the task analysis process captured data from two University of Cambridge Library Cataloguers. The participants used Ex Libris’s cataloguing software: Voyager – cataloguing client (different from the software used at the University of Lincoln), to input data for a range of publications, some with ISBN numbers (an ISBN being a unique identifier and therefore efficient form of record retrieval) and some without. If the ISBN does not exist, a search for a publication (to see if it already exists in the system) is searched using keywords – author and title.

Using the same tried-and-tested method ‘cognitive interview’, as used for previous data capture, participant usage-data was recorded with voiced running commentary of the activity undertaken and accompanying points-of-view about the user interface, processes, functionality, issues encountered and frustrations with the system.

Empirical observations were also undertaken with a further line of enquiry with the participants in relation to key points-of-note arising from the observations.

  • The software in use could well be investigated by Tony Robinson and the Time Team, having an aesthetic of Windows 3.1 and the usability effectiveness of a glove with three fingers… that’s fine if you only have three fingers, ah but which three?
    • However, the participants have approximately ten years a-piece of experience with the software and are therefore normalised to its idiosyncrasies.
    • The classic grey user interface provides a ‘vanilla’ boxy containment for displaying data, whilst sporting an interesting array of buttons with ambiguous metaphor and functionality – at least each button has a text description to accompany the button’s picture – phew.
    • The data presentation is structured in such a way that to access it, one is required to make further selections to gain access and drill down into the data, which then appears in a separate dialogue box.
    • Functionality is restricted to out-dated UI controls which prevent users from interacting with data in a way that is expected of modern software applications and operating systems. For instance, no drag and drop, and moving items can only be performed one at a time.
    • Redundant data – there are a number of tags that are apparently not required within the Marc record and time is taken to manually strip these out.
    • Representation of data – when viewing multiple records, each opens in a separate window, for which the participants move around the screen in an attempt to find a good position to scan and compare the records. The more records on the screen, the more cluttered the process.
    • Navigation – it can be difficult to assess where you are within the system – some form of sign posting and bread-crumbing would be useful.
    • Macros – although we didn’t get to see this in action – the participants like to use their customised macros to perform many of the actions that would otherwise be monotonous and time consuming. Requires further investigation.
  • It is evident that issues of usability exist as an artefact of the software as a result of common software design trends of the age; and thus restrictions of a proprietary build: aesthetic, layout, metaphor, representation of data, and functional restriction.
  • It is also evident that the cataloguers have a lot of experience with the software and the procedures involved in the decision making of how data is processed – which records to select, which fields to update, which checks to make etc. It not unexpected that a certain amount of specialised knowledge is required, however, a new member of staff would appear to be faced with a steep learning curve for these systems.

So it’s back to Lincoln to capture data from cataloguers using the Horizon system – this will provide insight to Cambridge and Lincoln’s processes and how they differ, whilst also highlighting aspects of processes and functions that work well and for which we can replicate in our user-friendly Open library system.

Thanks to our Cambridge Cataloguer participants: Celine Carty and David Rushmer.