In our adventures with CLOCK we’ve taken a look at a number of provided open data endpoints. The prototype we’re building is being developed on the philosophy that we’re trying to get as many and varied sources of data as possible into it to draw from; the idea being from this that we could take a lateral view of a given record across all of these sources. What does a record for “Watchmen” by “Alan Moore” look like in Lincoln? How similar is this to the record held by Cambridge? What about the British Library? This is the basic goal for our prototype; a tool that can compare and note similarities and differences in bibliographic data.
In theory, that should be relatively straightforward however we’re finding that in practice our ideas on how this could be done are possible somewhat flawed. The initial thought was that we could probably look at currently available SPARQL endpoints, including the Cambridge catalogue and the BritishLibrary. While we have something that kinda/sorta/maybe works, we’re finding that the process of querying the SPARQL points leaves something to be desired. They’re not quick; there are dissimilarities between the schemas; it’s not always obvious where the endpoint is.
This has led us to question whether our approach in using SPARQL is right? If we’re aggregating content is there a better way? I’m starting to see that the initial simple idea is potentially vastly more complex than simply hitting a number of endpoints and building content ad-hoc. We’re now talking localised indexes and I’m going to have to take a look at linked data approaches.