A month ago I posted that I was testing a 'semi-stable' build of Heritrix. The new build is called "Heritrix 3.3.0-LBS-2016-02" as this is built for LBS's (Icelandic acronym for my library) 2016-02 domain crawl (i.e. the second one this year).
I can now report that this version has passed all my tests without any regressions showing up. I've noted two minor issues, one of which was fixed immediately (Noisy alerts about 401s without auth challenge) and the other has been around since Heritrix 3.1.0 at the least and does not affect crawling in any way (Bug in non-fatal-error log).
Additionally, I heard from Netarkivet.dk. They also tested this version with no regressions found.
I think it is safe to say that if you are currently using my previous semi-stable build (LBS-2015-01), upgrading to this version should be entirely straightforward. There are no notable API changes to worry about either. Unless, of course, you are using features that are less 'mainstream'.
You can find this version on our Github page. You'll have to download the source and build it for yourself.
Update As you can see in the comments below, Netarkivet.dk has put the artifacts into a publicly accessible repository. Very helpful if you have code with dependencies on Heritrix and you don't have your own repository.
Thanks for the heads-up, Nicholas.
The artifacts are also available from the following nexus.
ReplyDeletehttps://sbforge.org/nexus/index.html#view-repositories;thirdparty~browsestorage
And from maven.
<repositories>
<repository>
<id>sbforge-nexus</id>
<url>https://sbforge.org/nexus/content/groups/public</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
Thanks to all your efforts, I now have a test Dockerised version that includes our modules. https://github.com/ukwa/docker-heritrix
ReplyDelete