We’re running a few (virtual) servers, nothing special. It is rather easy to turn those machines into snowflakes. To counter this we introduced Salt. Salt does a nice job in deploying software to servers and upgrading them when run regularly. But how do we counter issues when changing the Salt configuration itself? The solution is simple: Test!

I do not plan to test my changes directly in our live environment, nor do I want to set up and maintain such a dynamic environment locally. I want to put as much configuration as possible under version control (Git).

What I want to check is if provisioning an environment works and if the key services are online. I’m not so much interested in the individual states. It’s the overall result I want to check for. That’s what will run in production, so that’s the important part.

For example, say I want to set up a Jenkins master. I’d like to build and test my configuration locally as much as possible, maybe even test provisioning different operating systems. I might event want to validate my configuration on our CI server. You can find an example in my Salt formula testing repository on GitHub.

I created a small top.sls file for the salt environment:

base:
  'jenkins.*':
      - git
      - java
      - jenkins
      - node

And added the required Salt formula. So far so good. From this point onwards there are two things I can do:

  1. Spin up a VM, hook it up to our Salt environment as a minion and start the provisioning.
  2. Test it locally.

You can probably see the strategy 1 has some downsides. If I need to tweak the formula, I need to re-provision the VM, which in itself can already lead to configuration drift. That means that when I’m finished with the configuration I need to remove the VM and create a new one (and then hope I did not miss anything). Even worse: I'm testing on a live environment. I can't imagine what could happen when the environment gets reprovisioned with my intermediate work.

Instead, I create a Salt test environment locally with Vagrant (a blessing when you’re not running Linux on your laptop). The configurations themselves I want to deploy to “the simplest thing possible”: a Docker container. I considered using only Vagrant images, but Docker containers are much faster and it’s all about feedback in the end. Finally, I want to ensure that the right services are running. In the case of this example that will be Jenkins, listening on port 8080. For this I use a tool called Testinfra. Testinfra has a nice interface to test infrastructure and is built on top of Pytest. My checks are simple to start with:

import pytest

HOST_ID = "jenkins"

@pytest.fixture(scope="module", autouse=True)
def provision(Docker):
    Docker.provision_as(HOST_ID)

def test_service_running(Docker):
    Service = Docker.get_module("Service")
    assert Service("jenkins").is_running

def test_service_listening_on_port_8080(Docker, Slow):
    import time
    Socket = Docker.get_module("Socket")
    Slow(lambda: Socket("tcp://:::8080").is_listening)

The heavy lifting is done in conftest.py. This test setup file is loaded by Pytest by default.

Since I’m all into Salt anyway, the VM can be provisioned by Salt as well. Let’s add our test config to the top.sls:

  'salt-dev':
      - docker
      - testinfra

In this setup, I want to make sure I have the required tooling (Docker and Testinfra) and I want to have a few Docker images ready. Those images mimic the configuration found on the real VMs.

Running the tests becomes as simple as:

[vagrant@salt-dev test]$ testinfra -v
============================= test session starts ==============================
platform linux2 -- Python 2.7.5, pytest-2.8.7, py-1.4.31, pluggy-0.3.1 -- /usr/bin/python
cachedir: .cache
rootdir: /srv/test, inifile:
plugins: testinfra-1.0.1
collected 4 items

test_jenkins.py::test_service_running[centos7-salt-local] PASSED
test_jenkins.py::test_service_listening_on_port_8080[centos7-salt-local] PASSED
test_jenkins.py::test_service_running[ubuntu15-salt-local] PASSED
test_jenkins.py::test_service_listening_on_port_8080[ubuntu15-salt-local] PASSED

========================== 4 passed in 228.02 seconds ==========================

Ain’t that nice? With little hassle I can check complete Salt rollouts and verify they’re installed correctly. This approach already caught a few regression bugs.

Considerations

You may want to run the tests on your CI server. That’s a nice idea and it will definitely catch some regression in the end. As you can see, you still have to wait a couple of minutes to see if all tests pass. Depending on your CI infrastructure some tweaks may be required. (Open question: how to deal with pillar data then?)

You may want to decouple the scripts from the Docker containers and also use them to check the production infra. Testinfra can output in a format understood by Nagios, which is nice if you do service monitoring.

Source repository: Salt formula testing.