I've made several projects that add-on to Heritrix. Typically, these build a tarball (or zip file) that you can explode into Heritrix's root directory and all the necessary JAR files, job configurations and shell scripts wind up where they are supposed to be. This works well enough, but it does impose an extra step, a double install if you will.
So I decided to see if I could improve on this and have the add-on project actually bake itself into the Heritrix distribution. Turns out, this is easy!
Step one, update the project POM to have a dependency on the root Heritrix project distibution. Like so:
<dependency>
<groupId>org.archive.heritrix</groupId>
<artifactId>heritrix</artifactId>
<version>${org.archive.heritrix.version}</version>
<classifier>dist</classifier>
<type>tar.gz</type>
<scope>test</scope>
</dependency>
The key there is the classifier and type.
Next add to the plugin section of the POM instructions to unpack the above. Make sure this comes before the assembly plugin.
<!-- Unzip Heritrix distribution -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>unpack-heritrix</id>
<goals>
<goal>unpack-dependencies</goal>
</goals>
<phase>package</phase>
<configuration>
<outputDirectory>
${project.build.directory}/heritrix
</outputDirectory>
<includeGroupIds>org.archive.heritrix</includeGroupIds>
<excludeTransitive>true</excludeTransitive>
<excludeTypes>pom</excludeTypes>
<scope>test</scope>
</configuration>
</execution>
</executions>
</plugin>
Now all you need to do is ensure that the assembly plugin puts the necessary files into the correct directories. This can be done by specifying the outputdirectory as follows:
<outputDirectory>
/heritrix-${org.archive.heritrix.version}/lib
</outputDirectory>
and make sure that there is a fileSet to include the entire exploded Heritrix distribution. E.g.:
<fileSet>
<directory>target/heritrix</directory>
<outputDirectory>/</outputDirectory>
<includes>
<include>**</include>
</includes>
</fileSet>
And done. The assembly will now unpack the Heritrix distribution, add your files and pack it back up, ready for install like any other Heritrix distro.
So I decided to see if I could improve on this and have the add-on project actually bake itself into the Heritrix distribution. Turns out, this is easy!
Step one, update the project POM to have a dependency on the root Heritrix project distibution. Like so:
<dependency>
<groupId>org.archive.heritrix</groupId>
<artifactId>heritrix</artifactId>
<version>${org.archive.heritrix.version}</version>
<classifier>dist</classifier>
<type>tar.gz</type>
<scope>test</scope>
</dependency>
The key there is the classifier and type.
Next add to the plugin section of the POM instructions to unpack the above. Make sure this comes before the assembly plugin.
<!-- Unzip Heritrix distribution -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>unpack-heritrix</id>
<goals>
<goal>unpack-dependencies</goal>
</goals>
<phase>package</phase>
<configuration>
<outputDirectory>
${project.build.directory}/heritrix
</outputDirectory>
<includeGroupIds>org.archive.heritrix</includeGroupIds>
<excludeTransitive>true</excludeTransitive>
<excludeTypes>pom</excludeTypes>
<scope>test</scope>
</configuration>
</execution>
</executions>
</plugin>
Now all you need to do is ensure that the assembly plugin puts the necessary files into the correct directories. This can be done by specifying the outputdirectory as follows:
<outputDirectory>
/heritrix-${org.archive.heritrix.version}/lib
</outputDirectory>
and make sure that there is a fileSet to include the entire exploded Heritrix distribution. E.g.:
<fileSet>
<directory>target/heritrix</directory>
<outputDirectory>/</outputDirectory>
<includes>
<include>**</include>
</includes>
</fileSet>
And done. The assembly will now unpack the Heritrix distribution, add your files and pack it back up, ready for install like any other Heritrix distro.
No comments:
Post a Comment