January 19, 2015

The First IIPC Technical Training Workshop

It has always been interesting to me how often a chance remark will lead to big things within the IIPC. A stray thought, given voice during a coffee break or dinner at a general assembly, can hit a resonance and lead to something new, something exciting.

So it was with the first IIPC technical training workshop. It started off as an off the cuff joke at last year's Paris GA, about having a 'Heritrix DecideRule party'. It struck a nerve, and it also quickly snowballed to include an 'OpenWayback upgradathon' and a 'SolrJam'.

The more we talked about this, the more convinced I became that this was a good idea. To have a hands on workshop, where IIPC members could send staff for practical training in these tools. Fortunately, Helen Hockx-Yu of the British Library shared this conviction. Even more fortunately, the IIPC Steering Committee wholeheartedly supported the idea. Most fortunate of all, the BL was ready, willing and able to host such an event.

So, last week, on a rather dreary January morning around thirty web archiving professionals, from as far away as New Zealand, gathered outside the British Library in London and waited for the doors to open. Everyone eager to learn more about Heritrix, OpenWayback and Solr.

Day one was dedicated to traditional, presentation oriented, dissemination of knowledge. On hand were several invited experts on each topic. In the morning the basics fundamentals of the three tools were discussed, with more in depth topics after lunch. Roger Coram (BL) and I were responsible for covering Heritrix. Roger discussed the basics of Heritrix DecideRules and I covered other core features, notably sheet overlays in the morning. The afternoon focused on Heritrix's REST API, deduplication at crawl time, and writing your own Heritrix modules.

There is no need for me to repeat all of the topics. The entire first day was filmed and made available online, in IIPC's YouTube channel.

Day one went well, but it wasn't radically different from what we have done before at GAs. It was days two and three that made this meeting unique.

For the later two days only a very loose agenda was provided. A list of tasks related to each tool, varying in complexity. Attendees choose tasks according to their interests and level of technical know-how. Some installed and ran their very first Heritrix crawl or set up their first OpenWayback instance. I set up Solr via the BL's webarchive-discovery and set it to indexing one of our collection.

Others focused on more advanced tasks involving Heritrix sheet overlays and REST API, OpenWayback WAR overlays and CDX generation or ... I really don't know what the advanced Solr tasks were. I was just happy to get the basic indexing up and running.

The 'experts' who did presentations on day one, were, of course, on hand during days two and three to assist. I found this to be a very good model. Impromptu presentations were made on specific topics and the specific issues of different attendees could be addressed. I learned a fair amount about how other IIPC members actually conduct their crawls. There is nothing like hands-on knowledge. I think both experts and attendees got a lot out of it.

It was almost sad to see the three day event come to an end.

So, overall, a success. Definitely meriting an encore.

That isn't to say it was perfect, there is always room for improvement. Given a bit more lead-up time, it would have been possible to get a firmer idea of the actual interests of the attendees. For this workshop there was a bit of guess work. I think we were in the ballpark, but we can do better next time. It would also have been useful to have better developed tasks for the less experienced attendees.

So, will there be an opportunity to improve? I certainly hope so. We will need to decide where (London again or elsewhere) and when (same time next year or ...). The final decision will then be up to the IIPC Steering Committee. All I can say, is that I'm for it and I hope we can make this an annual event. A sort of counter-point to the GA.

We'll see.

Finally, I'd like to thank Helen and the British Library for their role as host and all of our experts for their contribution.

  1. Very happy to work with you on making an idea the reality and being the host. Hopefully we can do more together and better. Helen