Providing High Availability to stateless applications is pretty trivial as was shown in the previous blog posts A High Available Docker Container Platform and Rolling upgrade of Docker applications using CoreOS and Consul. But how does this work when you have a persistent service like Redis?

In this blog post we will show you how a persistent service like Redis can be moved around on machines in the cluster, whilst preserving the state. The key is to deploy a fleet mount configuration into the cluster and mount the storage in the Docker container that has persistent data.

To support persistency we have added a NAS to our platform architecture in the form of three independent NFS servers which act as our NAS storage, as shown in the picture below.

CoreOS platform architecture with fake NASThe applications are still deployed in the CoreOS cluster as docker containers.  Even our Redis instance is running in a Docker container. Our application is configured using the following three Fleet unit files:

The unit file of the Redis server is the most interesting one because it is our persistence service. In the unit section of the file, it first declares that it requires a mount for '/mnt/data' on which it will persist its data.

[Unit]
Description=app-redis
Requires=mnt-data.mount
After=mnt-data.mount
RequiresMountsFor=/mnt/data

In the start clause of the redis service, a specific subdirectory of /mnt/data is mounted into the container.

...
ExecStart=/usr/bin/docker run --rm \
    --name app-redis \
    -v /mnt/data/app-redis-data:/data \
    -p 6379:6379 \
    redis
...

The mnt-data.mount unit file is quite simple: It defines an NFS mount with the option 'noauto' indicating  that device should be automatically mounted on boot time.  The unit file has the option 'Global=true' so that the mount is distributed to  all the nodes in the cluster. The mount is only activated when another unit requests it.

[Mount]
What=172.17.8.200:/mnt/default/data
Where=/mnt/data
Type=nfs
Options=vers=3,sec=sys,noauto

[X-Fleet]
Global=true

Please note that the NFS mount specifies system security (sec=sys) and uses NFS version 3 protocol, to avoid all sorts of errors surrounding mismatches in user- and group ids between the client and the server.

Preparing the application

To see the failover in action, you need to start the platform and deploy the application:

git clone https://github.com/mvanholsteijn/coreos-container-platform-as-a-service.git
cd coreos-container-platform-as-a-service/vagrant
vagrant up
./is_platform_ready.sh

This will start 3 NFS servers and our 3 node CoreOS cluster. After that is done, you can deploy the application, by first submitting the mount unit file:

export FLEETCTL_TUNNEL=127.0.0.1:2222
cd ../fleet-units/app
fleetctl load mnt-data.mount

starting the redis service:

fleetctl start app-redis.service

and finally starting a number of instances of the application:

fleetctl submit app-hellodb@.service
fleetctl load app-hellodb@{1..3}.service
fleetctl start app-hellodb@{1..3}.service

You can check that everything is running by issuing the fleetctl list-units command. It should show something like this:

fleetctl list-units
UNIT			MACHINE				ACTIVE		SUB
app-hellodb@1.service	8f7472a6.../172.17.8.102	active		running
app-hellodb@2.service	b44a7261.../172.17.8.103	active		running
app-hellodb@3.service	2c19d884.../172.17.8.101	active		running
app-redis.service	2c19d884.../172.17.8.101	active		running
mnt-data.mount		2c19d884.../172.17.8.101	active		mounted
mnt-data.mount		8f7472a6.../172.17.8.102	inactive	dead
mnt-data.mount		b44a7261.../172.17.8.103	inactive	dead

As you can see three app-hellodb instances are running and the redis service is running on 172.17.8.101, which is the only host that as /mnt/data mounted. The other two machines have this mount in the status 'dead', which is an unfriendly name for stopped.

Now you can access the app..

yes 'curl hellodb.127.0.0.1.xip.io:8080; echo ' | head -10 | bash
..
Hello World! I have been seen 20 times.
Hello World! I have been seen 21 times.
Hello World! I have been seen 22 times.
Hello World! I have been seen 23 times.
Hello World! I have been seen 24 times.
Hello World! I have been seen 25 times.
Hello World! I have been seen 26 times.
Hello World! I have been seen 27 times.
Hello World! I have been seen 28 times.
Hello World! I have been seen 29 times.

Redis Fail-over in Action

To see the fail-over in action, you start a monitor on a machine not running Redis. In our case the machine running app-hellodb@1.

vagrant ssh -c \
   "yes 'curl --max-time 2 hellodb.127.0.0.1.xip.io; sleep 1 ' | \
    bash" \
    app-hellodb@1.service

Now restart the redis machine:

vagrant ssh -c "sudo shutdown -r now" app-redis.service

After you restarted the machine running Redis, the  output should look something like this:

...
Hello World! I have been seen 1442 times.
Hello World! I have been seen 1443 times.
Hello World! I have been seen 1444 times.
Hello World! Cannot tell you how many times I have been seen.
	(Error 111 connecting to redis:6379. Connection refused.)
curl: (28) Operation timed out after 2004 milliseconds with 0 out of -1 bytes received
curl: (28) Operation timed out after 2007 milliseconds with 0 out of -1 bytes received
Hello World! I have been seen 1445 times.
Hello World! I have been seen 1446 times.
curl: (28) Operation timed out after 2004 milliseconds with 0 out of -1 bytes received
curl: (28) Operation timed out after 2004 milliseconds with 0 out of -1 bytes received
Hello World! I have been seen 1447 times.
Hello World! I have been seen 1448 times.
..

Notice that the distribution of your units has changed after the reboot.

fleetctl list-units
...
UNIT			MACHINE				ACTIVE		SUB
app-hellodb@1.service	3376bf5c.../172.17.8.103	active		running
app-hellodb@2.service	ff0e7fd5.../172.17.8.102	active		running
app-hellodb@3.service	3376bf5c.../172.17.8.103	active		running
app-redis.service	ff0e7fd5.../172.17.8.102	active		running
mnt-data.mount		309daa5a.../172.17.8.101	inactive	dead
mnt-data.mount		3376bf5c.../172.17.8.103	inactive	dead
mnt-data.mount		ff0e7fd5.../172.17.8.102	active		mounted

Conclusion

We now have the basis for a truly immutable infrastructure setup: the entire CoreOS cluster including the application can be destroyed and a completely identical environment can be resurrected within a few minutes!

  • Once you have an reliable external persistent store, CoreOS can help you migrate persistent services just as easy as stateless services. We chose a NFS server for ease of use on this setup, but nothing prevents you from mounting other kinds of storage systems for your application.
  • Consul excels in providing fast and dynamic service discovery for  services, allowing the Redis service to migrate to a different machine and the application instances to find the new address of the Redis service through as simple DNS lookup!