Deep dive into Windows Server Containers and Docker – Part 3 – Underlying implementation of Hyper-V Containers
Last April I visited DockerCon 2017 and while they announced many new great things like the LinuxKit and the Moby Project, one of the most appealing announcements for me definitely was the announcement of John Gossman that Microsoft and Docker made it possible to run Linux containers natively on Windows Hosts by using the same Hyper-V isolation layer as Hyper-V containers. So, time for me to create a blogpost about Hyper-V containers and to explain how this Hyper-V container virtualization layer works.
In the previous blogpost of this series, we learned about the difference between containers and VMs and even Windows and Linux containers. Where Linux hosts only support one type of containers: Linux containers, Windows hosts do support multiple container types like Windows Server containers, Hyper-V containers and even Linux containers. Let’s dive into the Hyper-V container technology.
Normal Windows Server Containers
Even though normal Windows Server Containers have great benefits like instant-startup times and a small footprint compared to VMs, there may be some risks in using them. Especially in scenario’s where the container hosts are used by multiple tenants or in case of container hosts that are managed by a third party.
Because Windows Server Containers share the kernel of a container host, it is theoretically possible that containerized applications access the container host by making use of some implementations flaws within the kernel. While this may sound as a theoretic possibility, it is a real threat if you think about the number of detected kernel exploits on Windows that are reported multiple times a year. The risk in these cases is that if an attacker is able to compromise the kernel, he could potentially impact the container host or even other containers which are running on this host.
Not only implementation flaws, but also the design of Windows Server Containers can put your multi-tenant application landscape into high risk. In the previous post of this series I mentioned that containers ensure a file system, environment variables, registry and process isolation between applications. But this level of isolation is not the case between Windows Server Containers and the container host. Although the environment variable and registry isolation implementations are kind-of-like the same, (you are not able to see container specific environment variables and registry values from the container host) the file system and process isolation implementations are slightly different.
Difference in inter-container and host-container file system isolation
File system changes within running containers are stored in a .vhdx file on the container host. Within a container you are not able to see any of the filesystem content of another container, but from the container host, you are able to track those .vhdx files in the same way as with VM’s.
To find the location of this .vhd file for a given container make use of the docker inspect [container_name] command and look at the GraphDriver – Data attribute as shown in Figure 1. Notice the difference in size between a container and VM Hard Disk Image File as shown in figure 2 and 3.
Difference in inter-container and host-container process isolation
On a process isolation level, the difference is bigger. In contrast to real process isolation between Windows Server Containers, there is no process isolation between the container host and Windows Server Containers that are running on the host. This means that anyone who has access to the container host can see and kill running processes of Windows Server Containers which are running on the host as shown in Figure 4. Although this implementation is the same as with Linux containers, this might not be secure enough for all situations.
Because of the risk of implementation flaws in the kernel and the process isolation level of Windows Server Containers, Microsoft decided to introduce a new container type called Hyper-V containers. Hyper-V containers offer a secure solution for situations where container hosts are shared over different tenants and are 100% compatible with normal Windows Server Containers.
The isolation of Hyper-V containers is not based on a kernel-level construct, but uses the Hyper-V hypervisor technology to offer real kernel-level isolation. Under the hood an Hyper-V container is actually an Hyper-V utility VM which hosts a single Windows Server Container in order to create an extra security boundary. The setup of each Hyper-V utility VM is that it contains its own clone of the Windows Server 2016 kernel and hosts a separate Guest Compute Service for communication with the Compute Services abstraction of the Host User Mode.
The Hyper-V Utility VMs which are used as part of Hyper-V containers, are optimized to serve as container host and just contain functionalities that are really needed to run containers on it. To achieve instant-startup times for Hyper-V containers, Microsoft built a specific implementation of an Operation System that has a very small startup time and footprint. You might expect that you’ll find the Utility VMs of Hyper-V containers by making use of the normal Get-VM powershell commandlet. But that’s not true. Although we call it a Utility VM, it has some remarkable differences compared to normal VMs. It is for example just a read-only version of a VM so writes are not persisted. It also does not need to have an own hostname because the hosted containers have their own hostname and identity.
To make the startup time of Hyper-V containers even faster Microsoft implemented a concept of cloning the Utility VMs. The idea behind this cloning technology is that the first chunk of booting is the same for every Hyper-V container that has to be started. It is only once you actually start the Windows Server Container inside the Utility VM that things got different. So the cloning technology forks the VM state just before a specific Windows Container is put into the utility VM. If you start a new Hyper-V container in a fresh host a VM is booted and the exact point in the boot process is pointed out where it is about to start running differentiated code (the Windows Server Container). Just before that point in time, a quick snapshot and copy of the VM state is made by freezing it’s memory state. Next time a Hyper-V container is started, it is started from the snapshot that is made.
Creating Hyper-V containers is a runtime decision. At design time the difference between Windows Server Containers and Hyper-V containers does not exist. You just define your container images in the same way as with normal Windows Server Containers and in case of Docker this means that you consume exactly the same images to initialize both Windows Server Containers as Hyper-V Containers. In order to run Hyper-V containers be sure that you've enabled the Windows Hyper-V and Containers features on your container host.
To create a new Hyper-V container based on an existing Docker image, you have to add the isolation=hyperv switch to the docker run command as shown below.
docker run -it --isolation=hyperv microsoft/nanoserver cmd
Hyper-V Containers vs VMs
We’ve discussed that Hyper-V containers are designed for scenario’s where you need an extra security boundary around your containerized workloads. But why not use a separate Hyper-V VM instead of Hyper-V containers?
Although the use of Hyper-V VMs might be an alternative solution, the use of VMs for creating an extra level of isolation will have an negative impact on the flexibility, costs and scalability of your workload in production. The great benefit of using Hyper-V containers instead of VMs is that containers are designed against today’s requirements. They are fast to deploy, fast to destroy and have a small footprint. Hyper-V containers offer the benefits of containers like speed, scalability and efficiency but also offer the same kernel-level isolation as Hyper-V VMs.