Developing and deploying Java on middleware and in the cloud: rise of the Virtual Appliance?

Andrew Phillips

From Java EE to Google App Engine to GigaSpaces, the idea of developing against a middleware or "infrastructure" API is well established in the Java world.
But these are fixed environments. With the (re-)advent of virtualization, it is now becoming feasible to package and rapidly provision your own environment, custom-designed to meet your application's needs.
As the big middleware vendors are realizing, it is not just possible to create such Virtual Appliances, but necessary: a production app's setup inevitably includes more than just a couple of EARs.

Here, we'll look at the current state of cloud and middleware deployment tooling, examine possible future developments and draw parallels between deployment and related processes.

Cloud: The end of the beginning

The initial wave of IaaS cloud providers resulted in a bunch of infrastructure services with deceptively similar specs but frustratingly different, proprietary APIs, parameters, concepts and formats. The first wave of cloud tooling has aimed to tackle this portability nightmare by attempting to introduce common APIs and terminologies.
At the same time, there have been proposals for open virtual image and hypervisor API formats, but there is as yet no sign of widespread adoption in the short term.

One difficulty for the current "meta API" approaches1 is the fact that generic support, i.e. providing features supported by all providers, leads to a lowest common denominator that is too small to be of much practical use.
jclouds' TemplateBuilder is an interesting attempt to tackle this problem, but ultimately only the adoption of an IaaS standard will makes this one go away.

From a developer's perspective, a more fundamental issue is that a "raw" machine such as is represented by a virtual image profile is just too low-level to be useful. IaaS vendors then to come from the provisioning world which deals in servers, racks and cores. But developers are used to developing against "environment services"2, be that a Java EE cluster, a LAMP stack or a Hadoop installation3.

Middleware: Java EE and then some

After the large initial effort of putting together "corporate" Java EE implementations, the large enterprise vendors quickly moved to expanding the feature set of resources available to applications, such as persistent queues or clustered datasources.
As developers discovered and made use of these features, applications became progressively more tied to specific, often proprietary, middleware platforms. As a result, the development "deliverable" quickly grew to include not just the EARs envisaged by Java EE, but a variety of configuration options and resource definitions for the middleware container.

Keeping track of and deploying these loose packages, whilst remembering to apply appropriate modifications specific to the target environment, has proven to be a difficult, time-consuming and error prone task. Indeed, users regard the automatic management of these packages as a major benefit of XebiaLabs' deployment automation product Deployit.

Slowly, though, the Big Vendors are also realizing that an effective "management unit" for today's Java applications contains goes beyond Java EE artifacts, and also needs to contain information about the middleware configuration: a "Virtual Appliance", you might say. Both IBM's WebSphere CloudBurst and Oracle's upcoming Assembly Builder attempt to address this issue by making it possible to capture WebSphere and WebLogic configurations, modify them and apply them to different installation.
Both tools run only in the context of virtualized environments - in fact, they are largely built on top of the vendors' virtualization offerings - and still appear to be limited to the IBM and WebLogic stacks, respectively4.

Towards configurable Virtual Appliances

Current virtual image formats are essentially a long list of bits and bytes of memory and disk storage. I might be able to get some low-level infrastructure details such as IP addresses or number of cores from the API, but beyond that the best source of information about what this image actually contains is a free-text "description" field, intended mainly for human readability. So even if I have configured a Websphere ND 7.1 installation with a database and a cluster, all according to company policy, most of the details of the installation are lost.
Certainly, it is very hard to programmatically check dependencies, enforce policies or even find an image matching certain software requirements (Java installed, version >1.5, latest OS patches etc.).

One interesting possibility is to define an image as a base OS together with a list of yum, rpm or Conary packages5. This gives a much more developer-level view of what is in the image and avoids reinventing this particular packaging and installation wheel.
Unfortunately, though, these package managers are all OS-specific and don't leave much room for customization of package installation and configuration.

Templates, not clones

Bit-for-bit descriptions of Virtual Appliances make it easy to instantiate many clones of a machine. Most of the time, though, we need to instantiate and appliance that is just slightly different from the original - think data source credentials, naming conventions ot port numbers.
In fact, most of the time our "image" is more of a template to be instantiated than a master that just needs to be cloned. Unfortunately6, the current virtual machine formats don't support this.

Cloudlets are an interesting approach in this direction, allowing you to specify the file system tree of the image, using template files where appropriate. This can get you a long way in Unix environments, but still doesn't solve all the problems: parameterizing a WebLogic domain by modifying the XML configuration files in tricky and, incidentally, not supported. Or think binary configurations, tablespaces in a database installation or the Windows Registry.

Furthermore, from a developer's perspective, it's usually the service delivered by the virtual appliance that is interesting; whether your GigaSpaces XAP grid needs one or 100 images to run doesn't really matter. jclouds' NodeSet, which makes it possible to transparently manage and scale mutliple images, looks like an important conceptual advance here.

Virtual Appliances: a feature wishlist

Using the current generation of tools, we can put together a catalogue of possibly related images that are portable enough not to be tied to a vendor. With some trickery and a a bit of blood, sweat and tears we can even wire up some kind of dynamic configuration, even if that's usually specific to the contents of the image.
At the developer level, there are already a handful of projects out that focus on deploying platforms, such as Crane, or that add a "platform slice" on top of the infrastructure, such as webappVM7. Again, though, they tend to be tailored to one or a limited number of specific services.

What next? Here a few items from my "wishlist" of features for the next generation of Virtual Appliances:

Round-trip templates

You've taken the company-approved base OS image and spent a lot of time installing your middleware, getting the configuration just right and even making it parameterizable. A few months later a new base image is released, and your old one is no longer supported in production.
Currently, we treat images as an atomic unit, so there is no way to separate the "base" part, which presumably comes from the Operations team, from the "platform" part, which has probably been installed and configured by a developer.8

Structured metadata

You're trying to reproduce a problem you've come across in production and would quickly like to spin up a WebSphere cluster on your production operating system to investigate.
Today, you'll probably end up browsing through various image libraries looking for suitable templates. "WebSphere AS 6.1. on Ubuntu" looks promising, but is it the right version of the OS? What's the patch level of WebSphere? Which version of Java is it running?
Hopefully, it'll soon be possible to formulate these kind of requirements accurately (and plenty of other interesting ones, such as maximum spend or required SLA) and have the correct images chosen, provisioned and configured automatically, perhaps even across multiple clouds.

Service-based provisioning

You've read interesting things about Terracotta and want to see if it might help with session sharing. So you'd quickly like to spin up a Java EE container and wire in Terracotta.
How many images are involved, and what the exact version of the operating system is, isn't even interesting - it's the configured service you're after as a developer. You can already ask Crane to give you a certain type of Hadoop installation, and WebSphere CloudBurst supports this at limited level for certain IBM products. I'd hope that soon this'll be possible for many more types of service.

Haven't we been here before..?

Structured metadata, dependencies between images, management of composites, parameterizable configuration...all these themes sound terribly familiar to most developers. In fact, these are precisely the issues tools like Maven were designed to tackle during the build process.

Certainly, the spectrum of possible dependencies, features and configuration options is far greater when instantiating and growing a multi-image platform than during a software build, but there are plenty of wheels here that hopefully don't need to be reinvented.

Perhaps the realization that sticking to certain conventions, and foregoing the opportunity of applying all manner of arbitrary changes (or, at least, not worrying about making it easy), will be a key lesson to learn.

DTAP in a virtualized setup

It's worth noting that a lot of the complexity of current provisioning lies in the fact that both platforms and deployment packages still need to be tailored to their target environment.
Indeed, these are often the most error-prone aspects of typical deployment scenarios, and one of the main reasons why people are looking towards deployment automation products such as
Deployit, which are designed to tackle these issues.

However...in a virtualized setup where a new machine doesn't immediately mean lots of expensive hardware, is there really any need for the Development environment to run on a single server and the Test environment on a cluster?
In the days when all machines were connected to the same network, environment-specific naming conventions, port ranges and credentials formed an important safety net. But now that environments can be effectively isolated at the hypervisor level, is this still necessary?

Certainly, looking at the errors caused and general overhead imposed by the current practice of maintaining similar but different Development, Test, Acceptance and Production9 environments, the benefits of moving to a "single image" approach are self-evident. And even if, in many industries, we may never see fully virtualized Production environments, we can certainly have clones of this environment running in the clouds.

Footnotes

  1. Which, indeed, are also proprietary in the sense that all the "common" APIs are, for the most part, incompatible with each other.
  2. I would have used "middleware" but that term is taken ;-)
  3. I'm curious as to what the Spring/VMware combo will come up with in this space.
  4. Based on my knowledge of these products - corrections welcome!
  5. Optionally wired up by Puppet, Chef, cfengine, SmartFrog or any of the like.
  6. I say "unfortunately", but I should be grateful: automating these environment-specific settings and making sure you can deploy the same application package to all your environments is what Deployit, our deployment automation product, is all about.
  7. Now "Makara".
  8. Well, clever things can be done by taking and merging snapshots, but that's essentially a scary hack.
  9. We could throw in QA, UAT and a bunch of others, for good measure.

Comments (5)

  1. Adrian Cole - Reply

    March 9, 2010 at 6:30 pm

    Great summary of what's going on today, and what's addressing the concerns.

    What do you think of these "cloud templating" developments?

    1. pallet - flexible cloud bootstrapping
    http://github.com/hugoduncan/pallet

    2. chef rest api - cookbooks in the sky ;)
    http://wiki.opscode.com/display/chef/Chef+Server+REST+API

    -Adrian
    founder jclouds

  2. Andrew Phillips - Reply

    March 11, 2010 at 12:39 pm

    Update: I wrote that I was curious about Spring/VMware - well, it looks like something is on its way...

  3. Solomon Hykes - Reply

    March 11, 2010 at 4:23 pm

    Andrew, thanks for moving this conversation forward. A few comments:

    1. You mention Cloudlets and its lack of support for XML and binary templating. We want to support this. I'm emailing you separately to get more details and start working on it.

    2. I agree that service-driven deployment is the goal. I think smart images such as Cloudlets are a *necessary* building block to reach that goal.

    To cleanly address higher-level considerations such as SLAs, scale or multi-node deployments, you need to reference objects representing the behaviour of a configured node. These objects needs to be reusable and self-contained, regardless of *how* they were built and where they will run. That's the roadmap for Cloudlets, in a nutshell.

    Solomon Hykes
    @solomonstre

  4. Andrew Phillips - Reply

    March 11, 2010 at 5:23 pm

    @Adrian: Chef Server REST API - I think this could be useful if you're interested in using a Chef Server for configuration without wanting to have Chef clients on your images. I would imagine this is more likely to be an issue if you're in control of your image catalogue, but I'm not sure how common this scenario will be.
    I'd assume that you'd want to use Chef Client (or Chef Solo) if you're interested in actually executing recipies. But I guess you could use the API to talk to the server if you were focussing more on using the Server as a repository, potentially interpreting the data in a different way.
    Pallet - as far as I can gather, the main focus here is still on portability, i.e. running the same image (set) on different infrastructures. The "hook" to run shell scripts or recipies via Chef Solo is essentially what I had in mind when talking about "Optionally wired up by Puppet, Chef, cfengine, SmartFrog or any of the like.". Still, this is largely about the how of configuration, rather than the what.

  5. Andrew Phillips - Reply

    March 25, 2010 at 5:04 pm

    @Solomon:

    > Cloudlets and its lack of support for XML and binary templating

    If it reads as though I'm implying that Cloudlets does not support XML that was certainly not the intention. After all, they're text files, so you can use placeholders and template expressions perfectly well.

    What I was trying to point out is that "customization by config file templating" isn't always suitable, e.g. for systems with many config files which are not supported as a public API, such as WebSphere or WebLogic. Here, you'd presumably want to use the public administrative interfaces instead.

    Apart from binary files, which in aren't usually common in a Unix environment, one thing that's also not easy to do with file-level templating is specifying target-specific packages, e.g. a 32-bit version in Development vs. a 64-bit version in Test.

    Whilst these kind of configurations are regrettably still common, I think this is mainly due to "legacy" thinking. Having seen far too many "But it works in Dev! - Oh yes, but that's a different JVM version." problems, I strongly believe we should aim to be moving towards using the same environment setup everywhere, as I wrote. In that case, target-specific packages are thankfully not an issue ;-)

Add a Comment