Deep dive into Windows Server Containers and Docker – Part 2 – Underlying implementation of Windows Server Containers
With the introduction of Windows Server 2016 Technical Preview 3 in August 2015, Microsoft enabled the container technology on the Windows platform. While Linux had its container technology since August 2008 such functionality was not supported on Microsoft operating systems before. Thanks to the success of Docker on Linux, Microsoft decided almost 3 years ago to start working on a container implementation for Windows. Since September 2016 we are able to work with a public released version of this new container technology in Windows Server 2016 and Windows 10. But what is the difference between containers and VMs? And how are Windows containers implemented internally within the Windows architecture? In this blogpost we’ll dive into the underlying implementation of containers on Windows.
Containers vs VMs
Many container introductions start with the phrase that “Containers are lightweight VMs”. Although this may help people to get a conceptual understanding of what containers are, it is important to notice that this statement is a 100% wrong and can be very confusing. Containers are different then VMs and that’s the reason why I always introduce containers as “something different then VMs” or even say that “containers are NOT VMs”. But what is this difference? And why is this difference so important?
What containers and VMs have in common
While containers are NOT VMs, they both share three important characteristics:
- Isolated environment: like with VMs, containers ensure a file system, environment variables, registry and process isolation between applications. This means that, like with VMs, each container will create an isolated environment for all its internal applications. On movement, both containers and VMs seal not only the internal applications but also the context around these applications.
- Moved between hosts: a huge advantage of working with VMs is that we are able to move the VM snapshots over different host hypervisors without having to change any of its content. The same is true for containers. Where VMs can be “moved” accross different host hypervisors, containers can be “moved” across different container hosts. While “moving” both artifacts over different hosts, the content of the VM/container will stay exactly the same as the implementation on the previous hosts.
- Resource governance: another shared characteristic is that the available resources (CPU, RAM, network bandwidth) of both a container and VM can be limited to a given value. In both cases this resource governance can only be set from within the container host or hypervisor. Resource governance ensures that a container gets limited resources to minimize the risk that it will impact the performance of other containers running on the same host. A container can for example be constrained that it cannot use more than 10% of the CPU.
Differences between containers and VMs
While containers and VMs have some characteristics in common, there are also some important differences between containers and VMs.
- Level of virtualization: containers are a new level of virtualization. Looking at the history of virtualization it started with concepts like virtual memory and virtual machines. Containers are the next level of this virtualization trend. Where VMs are a result of hardware virtualization, containers are a result of OS virtualization. This means that where hardware virtualization let the VM believe that its hardware resources are dedicated to that instance, OS virtualization let the container believe that the OS instance is dedicated to that container. This difference in virtualization is importance to notice. Containers do for example not have their own kernel mode. For this reason containers can’t be seen as VMs and will also not been recognized as VMs within the Operating System (you can try it yourself with the PowerShell Get-VM commandlet). A good analogy to explain this difference is that of houses (VM’s) and apartment buildings (containers). Houses (the VMs) are fully self-contained and offer protection from unwanted guests. They also each possess their own infrastructure – plumbing, heating, electrical, etc. Apartments (the containers) also offer protection from unwanted guests, but they are built around shared infrastructure. The apartment building (Docker Host) shares plumbing, heating, electrical, etc. While they both may share some characteristics it are different entities.
- Dealing with OS: another important difference between containers and VMs is the way how both artifacts deal with the kernel mode. Where VMs have a full OS (and dedicated kernel mode) available, containers are sharing the “OS (actually the kernel mode)” with other containers and the container host. As a result containers should align with the OS of the container host while VMs can pick the OS (version and type) they like. Where VMs are able to run a Linux OS on top of a Windows hypervisor, with the container technology it is not possible to run a Linux container on a Windows container host and vice versa.
- Growth model: containers share the underlying resources of the container host and build an image that is exactly what you need to run your application. You start with the basics and you add what you need. VMs are built in the opposite direction. Most of the time we start with a full operating system and, depending on the application, strip out the things we don’t want.
Windows Server Containers
Now we know about the differences between VMs and containers, let’s dive a little deeper into the underlying architecture of Windows Server Containers. To explain how containers are implemented internally within the Windows operating system, you have to know about two important concepts: User Mode and Kernel Mode. Both are different modes between which a processor continuously switches, depending on what type of code it has to run.
The Kernel Mode of an operating system has been implemented for drivers that need to have unrestricted access to the underlying hardware. Normal programs (User Mode) have to make use of the operating system API’s to access hardware or memory. Code that is running within the Kernel Mode has direct access to those resources and shares the same memory locations/virtual address space as the operating system and other kernel drivers. Running code in this Kernel Mode is therefore very risky, because data that belongs to the operating system or another driver could be compromised as a result of your kernel mode code accidentally writing data to a wrong virtual address. If a kernel mode driver crashes, the entire operating system crashes. Running code within the kernel space should therefore be done as little as possible. This is exactly the reason why the User Mode was introduced.
In the User Mode, the code always runs in a separate process (user space), which has its own dedicated set of memory locations (private virtual address space). Because each application’s virtual address space is private, one application cannot alter data that belongs to another application. Each application runs in isolation, and if an application crashes, the crash is limited to that one application. In addition to being private, the virtual address space of a user-mode application is limited. A processor running in user mode cannot access virtual addresses that are reserved for the operating system. Limiting the virtual address space of a user-mode application prevents the application from altering, and possibly damaging, critical operating system data.
Technical implementation of Windows containers
But what do these processor modes have to do with containers? Each container is just a processor “User Mode” with a couple of additional features such as namespace isolation, resource governance and the concept of a union file system. This means that Microsoft had to adapt the Windows operating system in order to allow it to support multiple User Modes. Something which was very tough considering the high level of integration between both modes in earlier Windows versions.
Before the launch of Windows Server 2016, each Windows operating system we used consisted of a single “User Mode” and “Kernel Mode”. Since the introduction of Windows Server 2016 it is possible to have multiple User Modes running within the same operating system. The following diagram gives a global idea of this new multi-User Mode architecture.
Looking at the User Modes of Windows Server 2016, we can identify two different types: the Host User Mode and the Container User Modes (green blocks in the diagram). The Host User Mode is identical to the normal User Mode that we are familiar with in earlier versions of Windows. Goal of this User Mode is to host the core Windows services and processes like the Session Manager, Event Manager and networking. Moreover this User Mode facilitates, in case of the Windows Server Core implementation, user interaction via a User Interface with Windows Server 2016.
A new feature of Windows Server 2016 is that, once you enable the Containers feature, this Host User Mode will contain some additional container management technologies, which ensure that containers work on Windows. The core of this container technology is the Computer Services (orange block) abstraction, which exposes the low-level container capabilities provided by the kernel via a public API. In fact, these services only contain functionality to launch Windows containers, keep track of them while they are running and manage the functionality required for restarting. The rest of the Container Management functionality is implemented in the Docker Engine like keeping track of all containers, storing container images, volumes etc. This engine directly communicates with the Compute Services container API’s and makes use of the “Go language binding” offered by Microsoft to do so.
Difference between Windows and Linux containers
Although the same Docker client tools (Docker Compose, Docker Swarm) can manage both native Windows as Linux containers, there are some important differences between the implementation of containers on Windows and Linux.
Where Linux exposes its kernel level functionalities via syscalls, Microsoft decided to control their kernel mode outbound functionalities via DLL’s (this is also the reason why Microsoft not really has documented their syscalls). Although this way of abstracting syscalls has some benefits, it has led to a highly integrated Windows Operating System with a lot of inter-dependencies between different Win32 DLL’s and the expectation from a lot of DLL’s that some (in)direct referenced system services are running. As a result, within Windows containers it is not very realistic to just have the Application Processes running that we’ve started from our Dockerfile. Within Windows containers you’ll therefor see a bunch of extra System Processes running while Linux containers only have to run the specified Application Processes. To ensure that those necessary System Processes and services are running within the Windows container, within each container a so-called smss process is launched. Goal of this smss process is to start the necessary System Processes and services.
Not only in the way of exposing the kernel level functionalities, but also at an architectural level there is an important difference in how both Operating Systems deliver the container functionality to the Docker client tools. Looking at the current Windows Server 2016 container implementation a so-called Compute Services abstraction layer is implemented that abstracts the low-level container capabilities from the outside. Reason for this is that Microsoft can change the low-level container API’s without having to change the public API’s that are called by the Docker Engine or other container client tools. Against this Compute Services APIs you can program your own container management tools by making use of the C# or GO language binding which are available under https://github.com/Microsoft/dotnet-computevirtualization and https://github.com/Microsoft/hcsshim.
Union File System?
A third important difference between the Linux and Windows container implementations is the way how both operating systems deal with the "copy-on-write" technology of Docker. Because a lot of Windows applications expect NTFS semantics, it was hard for the Microsoft team to create a full Union File System implementation on Windows. For features like USN journals and transactions this would for example require a whole new implementation. The current Windows Server 2016 container implementation therefor does not contain a true Union Filesystem. Instead, the current implementation creates a new virtual NTFS disk per container. To keep this virtual disks small, the different file system entries within this virtual NTFS disks are essential symlinks to real files on the filesytem of the container host. Once you change files within a container, those files are persisted into a virtual block device and at the moment that you want to commit this layer of changes, it pulls those changes out of the virtual block device and persists it on the right file system location of the container host.
A last difference between the Linux and Windows container implementation is the concept of Hyper-V Containers. I’ll blog about this interesting type of containers in the next blogpost of this serie. Stay tuned…