Life Beyond Distributed Transactions
In an attempt to better understand the ideas behind 'Life Beyond Distributed Transactions, an Apostate's Opion.' by Pat Helland, I'm going to try to explain how the concept would work out for a time-honored example: the good old transfer of money from one account to the other, the archetype of all distributed transactions because we want to make absolutely sure we don't lose money.
I always took for granted that you really need a distributed transaction to solve the money transfer problem: money disappears at one end of the deal and shows up at the other. Executing only one of the two actions leaves someone with a problem. So you need a transaction to safe guard the operation, moreover you need a distributed transaction because the accounts may be owned by different banking systems. That's the point where reality sneaks up from behind your back and hits you over the head: there never is going to be such a thing as a transaction that spans two banks. At least, not in the sense that a debit action by Bank A is only visible until the corresponding credit action by Bank B is completed successfully. It would simply take too much time to complete anything at all because it may take (quite literally) days for the money to travel from A to B. All this time every read operation of the balance at Bank A would show the amount before deduction.
Working with distributed transactions across two banking systems would create a brittle system where the application or infrastructure would have to remember a possibly large set of operations that haven't yet completed and may be rolled back at any point in time. So that's not how it works, there are no distributed transactions between banks.
Given the ideas in 'Life..' we can solve the transfer problem in a generic way that works for transfers between banks and it also works for transfers between accounts stored in the same database, because the problem is far less complex. Applying the concepts described below even if there is only a single banking system involved, makes sense because it allows you to scale the application beyond the boundaries of a single server.
The first thing we need is an entity. In therms of 'Life...' an entity is a set of data that is persisted as a unit. This might be my checking account at Bank A with all the transactions against it or the data the bank needs to store about me as a customer. Note that I wrote 'my checking account' and 'me as a customer'. The 'my' and 'me' are important because in this context the term entity is used to mean a specific collection of data. In the traditional sense of entity relationship diagrams an entity would be the prototype to create many instances of database records, rather like a class is used to create many objects. Pat Helland redefines the term entity to mean 'a specific instance'. In size an entity would be larger than an object. Account entities would probably contain many objects, like all transactions against the account for the last two years.
Now for the money transfer. In a traditional non distributed application something like this would happen: lock account A; lock account B; after checking the balance and making sure the deduction is OK, deduct amount from A; add amount to B; write to store; commit. Even though the accounts are stored in a single database, this would be classified as a distributed transaction in the sense of the article because it spans the entities account A and account B.
To solve the problem you might buy a humongous IBM beast of a server, but you could also reconsider your software architecture and come up with a solution like the one below (and then you could still buy a humongous IBM server, that doesn't really matter).
The solution relies on three concepts: entities, messages and activities. We've seen entities above, so next in line are messages. Once we've deducted an amount from account A, the entity associated with A has to inform the entity that manages account B that it should add the amount to account B. This is done with a message that is handed to the infrastructure. Entities have two layers: a business layer that is blissfully unaware of technology (it is scale agnostic) and an infrastructure layer that is scale aware. The infrastructure layer sends a message to entity B. It starts trying to deliver the message once it is received from entity A. The scale aware layer of entity A may have to hunt down entity B in order to deliver the message. It may fail for technical reasons (entity B is not reachable due to a network failure) or business reasons (the account in entity B is not valid).
The code in entity A must deal with failure in the future: failure may occur some time after the message is sent. This is where the third concept, Activities, come into play. An activity is data about the conversation entity A has with entity B. Once entity A has placed the message on its out-queue, the activity will store this fact. If a success message is received from entity B, the activity is closed. If on the other hand a failure occurred, entity A has all the information it needs to act. A possible action could be to add the amount back to the account and send an email to the customer or log a failure message to the in-box of a bank employee. These actions may require the cooperation of other entities and thus lead to new messages and new activities, each working in its own transactional scope.
To summarize we saw that Entities are a collection of instances of data or objects that make some business sense. An entity has a scale agnostic and a scale aware layer. The scale aware layer deals with the delivery of messages to other entities. The scale agnostic layer stores information on messages sent to another entity in order to handle failures that cannot be automatically resolved by the scale aware layer. A transaction ends with the creation of a message. Interaction with other entities is never part of a transaction because other entities may live on a different server that might be unavailable. Activities store information on the exchange of messages between two entities and can be used by the scale agnostic business layer to handle failure.