This week I'm beginning a new blog series around OpenShift, a product that I have a lot of experience with and a great fondness for. For this week, I'm going to start by talking about containers, which is at the core of what OpenShift does.
What are Containers? Why would I want to use them? These are the main two questions I get when I tell people that I'm a container expert. There are a number of good reasons to use them, including security, small footprint, and rapid development. I believe, however, that the biggest benefit is the advantages they have over traditional deployment of software on virtual machines.
Most people in the IT industry are familiar with machine tools like VirtualBox or VMware. These tools enable us to run applications that are written for different operating systems on our computers. Over time, hardware and software developments made running these virtual machines faster and require less resources. You easily have the ability to run many different applications on the same server, even if they were built for different operating systems.
Hardware and software platforms appeared that allowed companies to spin up a lot of virtual machines quickly. This allowed teams to start deploying large numbers of applications without having to buy individual servers for each one. When loads increased, additional virtual machines could be spun up to add additional capacity.
There is a problem with this approach, however. Take a look at the typical stack for applications deployed on virtual machines:
Below the application and its library dependencies is a fat guest operating system layer that is often duplicated for each application, and that can be wasteful. Below that is a layer called a Hypervisor which translates the calls made in the guest OS into calls that are executed in the host OS. Thus operations like opening files or making network requests requires duplication of effort between the guest and host OS (platforms like VMware install special drivers that optimize this, but it is still additional work above what the host OS does).
Containers help address this problem by eliminating the guest OS from the picture all together. A typical container stack looks like this:
The app and binaries sit on top in what is called an image (I'll go into more detail about images in the next blog post). Below that is a thin layer that is the container engine, Docker in this example. Container engines like Docker are not a hypervisors and they are not intercepting system calls made by the app. Instead, these engines are managing the applications and utilizing features of the host operating system (particularly cgroups and namespaces) that provide resource isolation. In effect, the applications are boxed (i.e., containerized) so that they have limited visibility of the host they are running on. The filesystem the app can access is a small subset of the whole filesystem, and the app cannot interact with or see any other apps unless we want them to. System calls made in the app are executed in the host operating system kernel, just like non-containerized processes. In a lot of ways, it can seem like the app is running on a virtual machine, but it actually isn't.
When companies need to allocate resources to run applications on virtual machines, they generally need to provision more resources than the average load requires because they need to be available when load spikes. This leads to a large amount of computing resources sitting idle the majority of the time. It is possible to scale virtual machines to be larger when needed, but it often cannot be done quickly enough to handle unexpected traffic spikes. Containers scale differently. They are just small applications, and so typically to meet demand, you simple create additional containers to handle additional load. This can be done quickly and effectively, and when the load goes back down, you can easily stop and remove the containers that are no longer needed.
You don't have to choose between virtual machines and containers, however. These two technologies often complement each other well. The physical servers that you put into datacenters can still run virtual machines, and this will utilize the hardware and software tools for large scale deployments that already exist. On top of the virtual machines, however, you add container engines like Docker and then deploy your applications in containers. You won't need to scale the virtual machines as often because when you need more processing done, you will just scale up the number of containers.
Using a mixed virtual machine and container approach has the added benefit of abstracting the management of the real operating system away from the management of applications running on those servers. Operations staff that are in charge of applying vulnerability patches can do so without wondering whether those patches will affect the applications running in the containers. Since the containers usually have their own library dependencies bundled, the applications are immune to dependency version changes. You still have to address vulnerabilities in the libraries included in the image, but because of how container images are built, this turns out to be a lot easier as well. I'll talk about that next week when I dive into the topic of container images.
In this article I have talked at a high level about what containers are and why they have big advantages over virtual machines. Next week in part two of this series, I'll be talking about container images, which are the archives that you use to deploy an application onto a container platform like Docker.