February 15, 2016

OpenWayback - Developing an API for a CDX Server

OpenWayback 2.3.0 was released about a month ago. It was a modest release aimed to fixing a number of bugs and introducing a few minor features. Currently, work is focused on version 3.0.0, which is meant to be a much more impactful release.

Notably, the indexing function is being moved into a separate (bundled) entitiy, called the CDX Server. This will provide a clean separation of concerns between resource resolution and the user interface.

The CDX Server had already been partially implemented. But as work begin on refining that implementation it became clear that we would also need to examine very close the API between these two discreet parts. The API that currently exists is incomplete and at times vague and even, on occasion, contradictory. Existing documentation (or code review) may tell you what can be done, but it fails to shed much light on why you'd do that.

This isn't really surprising for an API that was developed "bottom-up", guided by personal experience and existing (but undocumented!) code requirements. This is pretty typical of what happens when delivering something functional is the first priority. It is a technical debt, but one you may feel is acceptable when delivering a single customer solution.

The problem we face, is that once we push for the CDX server to become the norm, it wont be a single costumer solution. Changes to the API will be painful and, consequently, rare.

So, we've had to step back and examine the API with a critical eye. To do this we've begun to compile a list of use cases that the CDX Server API needs to meet. Some are fairly obvious, others are perhaps more wishful thinking. In between there are a large number of corner cases and essential but non-obvious uses cases that must be addressed.

We welcome and encourage any input on this list. You can do so other be editing the wiki page, by commenting on the relevant issue in our tracker or sending an e-mail to the OpenWayback-dev mailing list.

To be clear, the purpose of this is primarily to ensure that we fully support existing use cases. While we will consider use cases that the current OpenWayback can not handle, they will, necessarily be ascribed a much lower priority.

Hopefully, this will lead to a fairly robust API for a CDX Server. The use cases may also allow us to firm up the API of the wayback URL structure itself. Not only will this serve the OpenWayback project, getting this API rights is very important to facilitate alternative replay tools!

As I said before, we welcome any input you may have on the subject. If you feel unsure of how to become engaged, please feel free to contact me directly.

No comments:

Post a Comment