Some initial work around users and use cases: cognitive interviews

Posted on May 15th, 2012 by Trevor Jones

In preparation for coding our CLOCK Open Data application, Trevor Jones and Andrew Beeken set out to understand how staff and students discover electronic data within the university’s existing library OPAC (online public access catalogue) system, and additionally, their Internet search behaviour. The proposed CLOCK system is to be developed with a range of users of the existing OPAC system in-mind – each with their own user requirements; though in general the format of returned search data is similar for each person. For instance, a search for a resource will return: title, author, and an ISBN – though this is not necessarily the case for a department such a ‘Performing Arts’ where a resource, such as a ‘Play,’ does not return an ISBN. Additionally, the CLOCK system needs to be intuitive and friendly for all users.

Focusing initially on Performing Arts students, a cognitive interview was devised comprising of three tasks designed to capture participant search behaviour whilst engaged in research activity.

Screen capture software was used to record the on-screen activity of the participants performing research (a task) with the tools of their choice, and an accompanying audio stream was recorded to provide a running commentary of the participant’s train of thought during the exercise.
Performing Arts students love to talk :)

Each task was allocated ten minutes – producing three 10 minute video files for each participant, for which there were a total of four participants. In addition to screen capture, a questionnaire was given to each participant with questions related to how they would perform the research for the given tasks (no use of computer). The reason being that there may be differences between thinking about a task and actually performing a task; and so, we asked two participants to begin with the questionnaire followed by the computer-based task, and the other two participants beginning with the computer-based task and filling out the questionnaire afterward.

The questionnaire also asks participants to comment on the tools used, and to provide discussion on the effectiveness of the toolsets and any issues encountered, and additional functionality they would like to have.

The three tasks below were written to resemble pseudo Performing Arts coursework assignments that incite the use of expected tools i.e. OPAC and Internet search engines.

Task 1

You have been tasked with the production of a stage play based on silent movie The Cabinet of Dr. Caligari, for which the only sound allowed to be used is the noise associated with cars (engines, car horns, indicators, reversing sonar, wipers, screeching tires, crash noises etc.).

Task 2

Examine the literature on Performance Arts Attendance with a view to analysing trends of socio economic characteristics that influence attendance at performing arts events. What, if any, are the factors and barriers to audience retention?

Task 3

Headlining current affairs, and being of concern with schools and child development agencies for many years now, children’s ‘behaviour’ has devolved in working and middle class families. Write a play that captures the essence of these ‘concerning’ times from one or more of the following perspectives: the school, the parent, the child. In addition, the play must be written in the style of Gilbert and Sullivan that veils the story behind an alternative setting (The Mikado).

Analysis of this data has shown that the main toolsets used are: OPAC, Google, Google Scholar, and International bibliography of theatre and dance with full text (a commercial database). There were variations of keywords used in the search criteria and not all with success.  Two reasons may account for this: spelling mistakes and poor choice of keyword. In general, only the top six results were scrutinised for relevant information; further results being ignored. This possibly due to the time constraint imposed on the task, or perhaps from the belief that additional results would not yield desired results.

Participants followed an expected pattern – launch Google, enter keyword, find the top hits (usually Wikipedia), get some background information, and then refine search (ejournals, articles and books); this involved using Google Scholar, OPAC and the commercial database. For video and sound samples, participants used YouTube and a Sound-byte specific website. No search was performed on OPAC for film and sound. NB. Europa Film Treasures is accessible from OPAC, but the search facility is limiting and does not provide an option to search by film title? Film and Sound returns a broken link. Google was deemed to be very supportive for searching information with its automated text search feature and recommendation function, and of course being simple and intuitive to use.  Some search terms were clunky at returning desired results, for instance things such as ‘theories’ and ‘criticisms’ of a play, ‘previous’ productions, ‘presently running’ productions.

Improvements were identified as follows:

To assist with searching: automated text/suggested words; highlighted key words; a side panel for storing returned searches that can then be used again later or discarded – including an identifier to indicate their source e.g. Google, OPAC etc..
A way of displaying inter-library loans, and relevant information relating to the search, with an indicator for which articles are actually available. The related information could be things such as where a play is presently showing, ticket office number, or past versions of the performance company, links to video… basically something that provides extra context relevant to the search.
Once a search has been performed – a method for jumping directly to the text that appears in the initial search return. For instance, when searching Google a list of results is returned which provide a sample of the text from within that web source; but when you access that link it is not easy to see where that text appears within the page.

So what next?

We’ve got more users to observe, and we’ll need to see how they conduct their use of the university toolsets and understand their particular requirements. These participants will include academic staff and library cataloguers.

Acknowledgements

For participating in this phase of the CLOCK project, we would like thank the following Performing Art students: Gemma Smart, Charlotte Haythorne, Marc Brock, and Kirsty Barnes.

CLOCK notes – 8 May 2012

Posted on May 8th, 2012 by Paul Stainthorp

This is what the CLOCK project team are currently up to (from meetings over the past couple of weeks and from notes made at the recent Discovery: making sure your resources are discovered, used and reused event in Birmingham):

  • Andrew Beeken has been exploring the Cambridge COMET data via its SPARQL endpoints and has already blogged about the process of using SPARQL to “build kind of a ‘Hello World’ of open data querying”. He’s now looking at the recently-released Harvard open bib data and comparing the speed, the use of matching namespaces, and the use of JSON vs RDF/XML.
  • This work is leading up to unified search and presentation of records from several sources (Cambridge/COMET, Harvard, Lincoln/Jerome, OpenLibrary, etc.). Andrew and Trevor Jones are collaborating on drawing up a high-level architecture for CLOCK, and a strategy for expressing Linked Data, which will be shared with the rest of the project team (and publicly) for discussion.
  • To support this, Alex Bilbie in ICT services at Lincoln is helping to get the original Jerome application up and running on the CLOCK server (jerome.library.lincoln.ac.uk), where it can be used as a stable platform for developing and RDF-ifying Lincoln’s own bib data.
  • Trevor Jones and Ed Chamberlain will work together on developing the work with users (in parallel, at the University of Lincoln and the University of Cambridge) to clarify their requirements for bibliographic data:
  • For cataloguers, based around a rethink of copy cataloguing workflows, we will try to tease out requirements from talking to cataloguers (and associated subject librarians) asking to be ‘positively disrupted’: what do they need to do? What is missing from their data?
  • For researchers, we will build on some initial user walkthrough analysis done by Trevor and Andrew in Lincoln, with performing arts students in LPAC (the Lincoln Performing Arts Centre). What are the research questions that users are trying to answer? How does bib data help them answer those questions? What’s missing? Ed and Trevor will agree on a set of questions and tasks;
  • These requirements will be used to feed the remainingcycles of platform development for CLOCK.
  • Ed Chamberlain will act as the conduit between CLOCK and related projects in the Discovery strand, looking for points of shared interest/technology, and blogging (or asking others to blog) about aspects of one project which can inform the others. The other projects in which Ed is involved are: the Open Education Metadata UK (OEM-UK) project at the Institute of Education (shared interest in new user interfaces for cataloguing – possibly use screencasts to demonstrate alternative workflows?) and the Open Bibliography 2 project (lots of potential technical overlap – BibJSON, JSON-LD, BibSoup.net, expression in RDF container formats).
  • Ed and I (Paul Stainthorp) will work on developing the ‘business case’ / sustainability of CLOCK and data.*.ac.uk, following up on themes discussed in the recent Discovery event, and thinking not only about institutional funding / high-level support for open bib data, but also what it takes to move open bib data publishing from a development environment into an institutionally-supported, ICT-run service.
  • Finally, PS is arranging a couple of internal CLOCK ‘hack days’ (to take place on 17th-18th May, in Cambridge) – more details to follow.
  • Working with dispersed open data

    Posted on May 4th, 2012 by Andrew Beeken

    A quick and dirty app to perform a basic search on an endpoint

    At its core, CLOCK is a project to link open bibliographic datasets. That’s the philosophy behind it. Where things start to get really exciting  is what we could build with this data. I’m not going to stand up and say the concept of linked open data is new; it’s not. But we’ve certainly got some interesting ideas of where we could take it from the point of view of bibliographic data…

    Paul Stanthorp has already written a post discussing the three target tiers for these applications.

    So, an update! Things have sprinted forward with CLOCK this week and, as well as some interesting theoretical study which my cohort Trevor Jones will shortly be blogging about, we’ve started looking at both potential high level applications (so that we’ve got something to strive for) and started building some basic search apps (so that we’ve got something real to play with).

    The first app we’ve cobbled together is kind of a “Hello World” of open data querying. Using the Cambridge Uni endpoint, we’ve hacked together a really simple app which takes user input in the form of a publication title, author and the number of records to limit the search to. The app takes this criteria, cobbles together a SPARQL query and, using the sparqllib php library, we fire that query at the endpoint and display the results. Publication titles then link back to the URI from Cambridge. We’ve wrapped the Skeleton HTML boilerplate round this for pretties.

    While this might seem pretty trivial to some, it’s a good starting point for where we want to go with CLOCK which is, ultimately, delivering “consumer level” applications that run on simple user input rather than relying on users being au fait with complex query languages.

    Where next? Well, we want to expand on this basic app and introduce:

    • Multiple data sources
    • Local data caching
    • Better handling of search criteria
    • Autocompletion
    • Highlighting of search terms
    • User accounts (possible cross login from existing uni accounts) which will instruct the default library searched
    • Favourites and saved searches

    This proof of concept development also sits alongside the theoretical work we’re doing.

    The technical approach: a CLOCK dev stack

    Posted on May 2nd, 2012 by Paul Stainthorp

    A note on technical development:

    We’re beginning to make some progress towards a framework for development in the CLOCK project. Project developers Trevor Jones and Andrew Beeken, with the support of the other developers in LNCD, now have the following at their fingertips:

    That list should give you an idea of LNCD’s approach to development. [N.B. some links may not be publicly accessible.]

    CLOCK implementation: key themes (the Peterborough meeting)

    Posted on May 2nd, 2012 by Paul Stainthorp

    Screengrab of our notes from the CLOCK Peterborough meeting

    This blog post is a comment upon the formal project implementation plan, and gives some more detail about how the CLOCK project intends to meet its project aims.

    In February, 2012, the project team (EC, CL, PS, OS) met at Peterborough Regional College (roughly equidistant between Lincoln and Cambridge!) to discuss the implementation plan and our CLOCK ‘first steps’. We made copious notes using an interactive whiteboard. Here’s what we agreed for CLOCK…

    Most of the day’s discussion was spent attempting to define more clearly the users/audience for CLOCK, narrowing down the field of study a bit as we went along, and looking for potential ways to engage those audiences in the research. We agreed that our users consist of:

    1. Cataloguers and library managers looking to innovate their resource description workflows as well as contribute to the corpus of Open Bib Data, through improving/correcting/augmenting existing records as well as submitting new records, “adding to the story” by allowing libraries to incorporate data elements outside the boundaries of traditional resource description.

    We spent a while discussing how the project might approach the problem of proposing new ”…minimal workflows for cataloguing around individual, disaggregated RDF elements” (taken from the project plan). We’ve also since discussed this back at Lincoln with staff in the Library and LNCD – I’ll shortly be blogging some diagrams which illustrate several different possible approaches to cataloguing workflow, as part of the ‘Users and use cases’ thread. We’ll also ve speaking to cataloguers at Lincoln and at Cambridge to try and get a clearer picture of the ‘pinch points’ in existing cataloguing, where applications using OBD might make a difference to their work.

    Key quotes:

    “Matching / negotiating of the best available Open bib data through common identifiers; the importance of a social/reputational aspect in identifying authoritative data; [use of] associated social/reputational metadata making explicit the provenance, history, and ‘pagerank’ measurements of each data element. [The phrase 'a narrative verdict on the catalogue record' was used…]“

    2. Researchers (qualified as “the ‘serious’ and tech-savvy researcher“), who may be keen to incorporate Open Bib Data in user tools (e.g. citation/reference management software). We agreed to concentrate within the CLOCK project on a specific discipline—that of Drama/Performing Arts—because of the interesting challenges posed by the description of performance resources in existing bibliographic data. (“Almost anything you’d want to know about a play isn’t recorded in the MARC record!”). We identified a number of potentially useful resources and sources of data, including:

    • The play’s the thing
    • TheatreDB
    • Resources in institutional repositories
    • Theatricalia
    • Dutch Culture Link
    • Wikipedia/DBpedia

    We agreed that we’ll set up a series of interviews/structured tasks for researchers in performing arts at Cambridge and Lincoln; also for subject librarians in the discipline (as a proxy to the researchers themselves). CLOCK will look at how well existing catalogue data describes performance and related resources (perhaps by sampling MARC records at both instititutions), and how external sources of ‘non-library’ data might complement and enhance those records.

    3. Developers attached to academic libraries, who are looking to build applications exploiting available Open Bib Data, and techniques for interrogating and exploiting that data. The engagement with this audience is probably more at a strategic level than the first two – what are the technology choices and the decisions around the design of APIs and data endpoints – can we make a case study on developing using OBD?

    We also discussed CLOCK’s overlap with other projects (in particular the Open Biblio 2 and the Open Education Metadata UK project). This work will be picked up by Ed Chamberlain, who is a common factor in all three projects!

    “The project team believe that an important aspect of this innovation will be serious consideration given to the development of an awesome, national, open scholarly catalogue knowledgebase for the UK (“data.ac.uk/library” or “library.data.ac.uk”).”

    Members of the CLOCK project team have since signed up to the new DATA-AC-UK mailing list and we will use the project as an opportunity to propose first steps in publishing national bibliographic data to data.ac.uk. This will be the topic of a future blog post.

    “CLOCK will explore options for updating and maintaining the shared platform on data.lincoln.ac.uk as an eventual service”

    University of Lincoln developer Alex Bilbie has blogged about the future of 5★ open data publishing at Lincoln: “As part of the Jerome project, we cracked open the university library’s digital catalogues and stored the data in a sane format (i.e. not MARC). Now through the CLOCK project the data will be semantically marked-up and compatible with other institutions bibliographic data“. This will also be the topic of a future blog post.