October 2, 2014

Heritrix, Java 8 and sun.security.tools.Keytool

I ran into this issue and I figured if I don't write it up, I'll be sure to have forgotten all the details when it occurs again.

The issue is that Heritrix (which is still built against Java 6) uses sun.security.tools.Keytool on startup to generate a self signed certificate for its HTTPS connection. However, in Java 8, Oracle changed this class to be sun.security.tools.keytool.Main.

As Heritrix only generates the certificate once, I only ran into this issue when installing a new build of Heritrix, not when I upgraded Java to 8 on my crawl server. You can run Heritrix with Java 8 just fine as long as you launch it once with Java 7 (or, presumably older).

It should be noted that Java warns against using anything from the sun package. It is not considered part of the Java API. But I believe that the only alternative is to have people manually set up the certificates.

This does mean two things:

1. You need a version of Java prior to 8 to work on Heritrix. It is possible for newer versions to be in compatibility mode with a prior version. But Keytool isn't part of Java proper. If you only have Java 8 installed, you will not have the necessary dependency available on the classpath. Your IDE will complain incessantly.

2. Building Heritrix on machines with only Java 8 is not possible.

I've also seen at least one unit test also using Keytool (this may be only in a pending pull request, I haven't looked into it deeply).

This isn't an immediate problem as Java 7 is still supported and available from Oracle. However, if they discontinue Java 7 it will quickly become a problem (just try get Java 6 to install from Oracle).

If you want to run Heritrix with Java 8 your options are:

1. First run it once with Java 7 or prior.
2. Use the -s option to specify a custom keystore location and passwords. You can build that keystore using external tools.
3. Manually create the adhoc.keystore file (in Heritrix's working directory) that Heritrix usually generates automatically. This can be done using Java 8 tools with the following command (assumes Java's bin directory is on the path):
  $ keytool -keystore adhoc.keystore -storepass password 
    -keypass password -alias adhoc -genkey -keyalg RSA 
    -dname "CN=Heritrix Ad-Hoc HTTPS Certificate" -validity 3650

Number 3 rather points at a possible solution to this. Just move this generation of an adhoc keystore to the shell script that launches Heritrix.

Edited to add #4: Copy an adhoc.keystore from a previous Heritrix install, if you have one lying about.

5 comments:

  1. I have some java code to generate certs that does not use suns libraries but bouncycastle. Should probably create a H3 patch some day... :)

    ReplyDelete
  2. I used way 3 to generate the adhoc.keystore and place it to bin folder of heritrix, but the same problem still exists , could you give me an advice ? Thanks

    ReplyDelete
  3. Charlie, Heritrix expects the adhoc keystore in its working directory. That is usually the root directory of the Heritrix installation. E.g. the parent directory of bin.

    ReplyDelete
  4. Here's a pull request to address the issue by loading the class dynamically, trying both old and new names. That means it still uses a class that it's not supposed to. But I think this is the easiest solution to implement, in part because it also takes care of the unit tests that use KeyTool.
    https://github.com/internetarchive/heritrix3/pull/129

    ReplyDelete
  5. Thanks Kristinn by documenting your solutions. I think the Noah's fix should be accepted asap it looks fine.

    ReplyDelete