Developing and deploying Java on middleware and in the cloud: rise of the Virtual Appliance?
From Java EE to Google App Engine to GigaSpaces, the idea of developing against a middleware or "infrastructure" API is well established in the Java world.
But these are fixed environments. With the (re-)advent of virtualization, it is now becoming feasible to package and rapidly provision your own environment, custom-designed to meet your application's needs.
As the big middleware vendors are realizing, it is not just possible to create such Virtual Appliances, but necessary: a production app's setup inevitably includes more than just a couple of EARs.
Here, we'll look at the current state of cloud and middleware deployment tooling, examine possible future developments and draw parallels between deployment and related processes.
Cloud: The end of the beginning
The initial wave of IaaS cloud providers resulted in a bunch of infrastructure services with deceptively similar specs but frustratingly different, proprietary APIs, parameters, concepts and formats. The first wave of cloud tooling has aimed to tackle this portability nightmare by attempting to introduce common APIs and terminologies.
At the same time, there have been proposals for open virtual image and hypervisor API formats, but there is as yet no sign of widespread adoption in the short term.
One difficulty for the current "meta API" approaches1 is the fact that generic support, i.e. providing features supported by all providers, leads to a lowest common denominator that is too small to be of much practical use.
jclouds' TemplateBuilder is an interesting attempt to tackle this problem, but ultimately only the adoption of an IaaS standard will makes this one go away.
From a developer's perspective, a more fundamental issue is that a "raw" machine such as is represented by a virtual image profile is just too low-level to be useful. IaaS vendors then to come from the provisioning world which deals in servers, racks and cores. But developers are used to developing against "environment services"2, be that a Java EE cluster, a LAMP stack or a Hadoop installation3.
Middleware: Java EE and then some
After the large initial effort of putting together "corporate" Java EE implementations, the large enterprise vendors quickly moved to expanding the feature set of resources available to applications, such as persistent queues or clustered datasources.
As developers discovered and made use of these features, applications became progressively more tied to specific, often proprietary, middleware platforms. As a result, the development "deliverable" quickly grew to include not just the EARs envisaged by Java EE, but a variety of configuration options and resource definitions for the middleware container.
Keeping track of and deploying these loose packages, whilst remembering to apply appropriate modifications specific to the target environment, has proven to be a difficult, time-consuming and error prone task. Indeed, users regard the automatic management of these packages as a major benefit of XebiaLabs' deployment automation product Deployit.
Slowly, though, the Big Vendors are also realizing that an effective "management unit" for today's Java applications contains goes beyond Java EE artifacts, and also needs to contain information about the middleware configuration: a "Virtual Appliance", you might say. Both IBM's WebSphere CloudBurst and Oracle's upcoming Assembly Builder attempt to address this issue by making it possible to capture WebSphere and WebLogic configurations, modify them and apply them to different installation.
Both tools run only in the context of virtualized environments - in fact, they are largely built on top of the vendors' virtualization offerings - and still appear to be limited to the IBM and WebLogic stacks, respectively4.
Towards configurable Virtual Appliances
Current virtual image formats are essentially a long list of bits and bytes of memory and disk storage. I might be able to get some low-level infrastructure details such as IP addresses or number of cores from the API, but beyond that the best source of information about what this image actually contains is a free-text "description" field, intended mainly for human readability. So even if I have configured a Websphere ND 7.1 installation with a database and a cluster, all according to company policy, most of the details of the installation are lost.
Certainly, it is very hard to programmatically check dependencies, enforce policies or even find an image matching certain software requirements (Java installed, version >1.5, latest OS patches etc.).
One interesting possibility is to define an image as a base OS together with a list of yum, rpm or Conary packages5. This gives a much more developer-level view of what is in the image and avoids reinventing this particular packaging and installation wheel.
Unfortunately, though, these package managers are all OS-specific and don't leave much room for customization of package installation and configuration.
Templates, not clones
Bit-for-bit descriptions of Virtual Appliances make it easy to instantiate many clones of a machine. Most of the time, though, we need to instantiate and appliance that is just slightly different from the original - think data source credentials, naming conventions ot port numbers.
In fact, most of the time our "image" is more of a template to be instantiated than a master that just needs to be cloned. Unfortunately6, the current virtual machine formats don't support this.
Cloudlets are an interesting approach in this direction, allowing you to specify the file system tree of the image, using template files where appropriate. This can get you a long way in Unix environments, but still doesn't solve all the problems: parameterizing a WebLogic domain by modifying the XML configuration files in tricky and, incidentally, not supported. Or think binary configurations, tablespaces in a database installation or the Windows Registry.
Furthermore, from a developer's perspective, it's usually the service delivered by the virtual appliance that is interesting; whether your GigaSpaces XAP grid needs one or 100 images to run doesn't really matter. jclouds' NodeSet, which makes it possible to transparently manage and scale mutliple images, looks like an important conceptual advance here.
Virtual Appliances: a feature wishlist
Using the current generation of tools, we can put together a catalogue of possibly related images that are portable enough not to be tied to a vendor. With some trickery and a a bit of blood, sweat and tears we can even wire up some kind of dynamic configuration, even if that's usually specific to the contents of the image.
At the developer level, there are already a handful of projects out that focus on deploying platforms, such as Crane, or that add a "platform slice" on top of the infrastructure, such as webappVM7. Again, though, they tend to be tailored to one or a limited number of specific services.
What next? Here a few items from my "wishlist" of features for the next generation of Virtual Appliances:
You've taken the company-approved base OS image and spent a lot of time installing your middleware, getting the configuration just right and even making it parameterizable. A few months later a new base image is released, and your old one is no longer supported in production.
Currently, we treat images as an atomic unit, so there is no way to separate the "base" part, which presumably comes from the Operations team, from the "platform" part, which has probably been installed and configured by a developer.8
You're trying to reproduce a problem you've come across in production and would quickly like to spin up a WebSphere cluster on your production operating system to investigate.
Today, you'll probably end up browsing through various image libraries looking for suitable templates. "WebSphere AS 6.1. on Ubuntu" looks promising, but is it the right version of the OS? What's the patch level of WebSphere? Which version of Java is it running?
Hopefully, it'll soon be possible to formulate these kind of requirements accurately (and plenty of other interesting ones, such as maximum spend or required SLA) and have the correct images chosen, provisioned and configured automatically, perhaps even across multiple clouds.
You've read interesting things about Terracotta and want to see if it might help with session sharing. So you'd quickly like to spin up a Java EE container and wire in Terracotta.
How many images are involved, and what the exact version of the operating system is, isn't even interesting - it's the configured service you're after as a developer. You can already ask Crane to give you a certain type of Hadoop installation, and WebSphere CloudBurst supports this at limited level for certain IBM products. I'd hope that soon this'll be possible for many more types of service.
Haven't we been here before..?
Structured metadata, dependencies between images, management of composites, parameterizable configuration...all these themes sound terribly familiar to most developers. In fact, these are precisely the issues tools like Maven were designed to tackle during the build process.
Certainly, the spectrum of possible dependencies, features and configuration options is far greater when instantiating and growing a multi-image platform than during a software build, but there are plenty of wheels here that hopefully don't need to be reinvented.
Perhaps the realization that sticking to certain conventions, and foregoing the opportunity of applying all manner of arbitrary changes (or, at least, not worrying about making it easy), will be a key lesson to learn.