Refactoring a monolith to Microservices
For a training on Microservices that is currently under development at Xebia, we've created implementations of a web shop in both a monolithic and Microservices architecture. We then used these examples in a couple of workshops to explain a number of Microservices concepts (see here and here). In this post we will describe the process we followed to move from a monolith to services, and what we learned along the way.
First we built ourselves a monolithic web shop. It's a simple application that offers a set of HTTP interfaces that accept and produce JSON documents. You can find the result here on Github. The class com.xebia.msa.rest.ScenarioTest shows how the REST calls can be used to support a user interface (we didn't actually build a user interface and hope you can imagine one based on the ScenarioTest code...).
So this was our starting point. How should we split this monolith into smaller pieces? To answer this question we started out defining our services in a Post It-saturated design session. We used event storming to find out what the boundaries of our services should be. A good place to start reading on event storming is Ziobrando's blog. The result is shown in the picture below.
We came up with seven aggregates that were translated into software in four services: Catalog, Shop, Payment and Fulfillment. We decided those four services would be enough to start with. One service calls another using REST, exchanging JSON documents over HTTP. The code can be found here. Our process to move from a monolith to services was to copy the monolith code four times, followed by stripping out unnecessary code from each service. This results in some initial duplication of classes, but gradually the domain models of the services started drifting apart. An Order in the context of Payment is really very simple compared to an Order in the context of Shop, so lots of detail can be removed. Spring JSON support helps a great deal because it allows you to just ignore JSON that doesn't fit the target class model: the complex Order in Shop can be parsed by the bare-bones Order in Payment.
Though the new solution will probably work well, we weren't quite satisfied. What would happen for instance if one of the services became unavailable? In our implementation this would mean that the site would be down; no Payment -> no Order. This is a pity because in practice a shop may want to accept an Order and send an invoice to be paid later. For inspiration on integration patterns refer to Gero's blog.. Our solution was still tightly coupled, not at design time, but at runtime.
To fix this problem we decided to place queues between services: one service may not call another service but only send out messages on queues. The result can be found here. This solution looks more like the diagram below.
Events in the diagram correspond to events in the code: a service informs the world that something interesting has happened, like a new Order is completed or a Payment is received. Other services register their interest in certain types of event and pick up processing when needed, corresponding to pattern #3 in Gero's blog.
This architecture is more robust in the sense that it can handle delays and un-availability of parts of the infrastructure. This comes at a price though:
- A user interface becomes more complex. Customers will expect a complete view of their order including payment information, but this is based on UI parts under control of different services. The question now becomes how to create a UI that looks consistent to end users while it is still robust and respects service boundaries.
- What happens to data about orders that is stored by Shop if something goes wrong later in the process? Imagine that a customer completes an order, is presented with a payment interface and then fails to actually pay? This means Shop could be left with stale Orders, so we may need some reporting on that to allow a sales rep to follow up with the customer or a batch job to just archive old orders.
- We often got lost while refactoring. The picture showing the main events really helped us stay on track. While this was hard enough in baby systems like our example, it seems really complex in real-life software. Having a monolith makes it easier to see what happens because you can use your IDE to follow the path through code. How to stay on track in larger systems is an open question still.
We plan to explore both issues later and hope to report our findings.