Heritrix 3.3.0-LBS-2016-02 |
You can review the full list of changes between my last Heritrix build (2015-01) and this new one here. Here is a list of the main changes:
- Some fixes to how server-not-modified revisit records are written (PR #118).
- Fix outlink hoppath in metadata records (PR #119)
- Allow dots in filenames for known good extensions (PR #120)
- Require Maven 3.3 (PR #126)
- Allow realm to be set by server for basic auth (PR #124)
- Better error handling in StatisticsTracker (PR #130)
- Fix to Java 8 Keytool (PR #129) - I wrote a post about this back in 2014.
- Changes to how cookies are stored in Bdb (PR #133)
- Handle multiple clauses for same user agent in robots.txt (PR #139)
- SourceSeedDecideRule and SeedLimitsEnforcer (PR #137 and #148)
- 'Novel' URL and byte quotes (PR #138)
- Only submit 'checked' checkbox and radio buttons when submitting forms (PR #122)
- Form login improvements (PR #142 and #143)
- Improvements to hosts report (PR #123)
- Handle SNI error better (PR #141)
- Allow some whitespace in URLs extracted by ExtractorJS (PR #145)
- Fix to ExtractorHTML dealing with HTML comments (PR #149)
- Build against Java 7 (PR #152)
I've ignored all pull request that apply primarily to the contrib package in the above. There were quite a few there, mostly (but not exclusively) relating to AMQP.
I've done some preliminary testing and everything looks good. So far, the only issue I've noted is one that I was already aware of, about noisy alerts relating to 401s.
I'll be testing this version further over the next few weeks and welcome any additional input.
No comments:
Post a Comment