Recently I found an informative article by Matt Williams about Java in Docker and its memory constraints. The author brings up an interesting topic about hidden issue with memory limits that people can face during a container exercise.
A high rate of shares and likes shows that this topic resonates among Java developers.
So I would like to analyze the issue deeper and find out possible solutions.
Problem
Matt describes his overnight “journey” with JVM heap default behaviour in a Docker container. He figured out that RAM limits are not correctly displayed inside a container. As a result, any Java or other application sees the total amount of RAM resources allocated to the whole host machine, and JVM cannot indicate how much resources were provided to the parent container it’s running in. This leads to OutOfMemoryError due to incorrect JVM heap behaviour in a container.
Fabio Kung, from Heroku, has deeply described the main reasons of this problem in his recent article “Memory inside Linux containers. Or why don’t free and top work in a Linux container?”
Most of the Linux tools providing system resource metrics were created before cgroups even existed (e.g.: free and top, both from procps). They usually read memory metrics from the proc filesystem: /proc/meminfo,/proc/vmstat, /proc/PID/smaps and others.
Unfortunately /proc/meminfo, /proc/vmstat and friends are not containerized. Meaning that they are not cgroup-aware. They will always display memory numbers from the host system (physical or virtual machine) as a whole, which is useless for modern Linux containers (Heroku, Docker, etc.). Processes inside a container can not rely on free, top and others to determine how much memory they have to work with; they are subject to limits imposed by their cgroups and can’t use all the memory available in the host system.
The author highlights the importance of real memory limits visibility. It allows to optimize applications and troubleshoot problems inside containers: memory leaks, swap usage, performance degradation, etc. In addition, some use cases rely on vertical scaling for resource usage optimization inside containers by changing the number of workers, processes or threads automatically. The vertical scaling usually depends on how much memory is available for a specific container, so the limits need to be visible inside the container.
Solution
Open containers initiative works on runC improvements to implement a filesystem userspace override of /proc files. And LXC creates lxcfs file system that allows containers to have virtualized cgroup filesystems and virtualized views of /proc files. So this issue is in a focus of container maintainers. And I believe the mentioned enhancements can help to solve the problem on the bottom level.
We also faced the same issue at Jelastic and already solved it for our customers, so we would like to show how it works.
First of all, let’s go to the Jelastic wizard, choose a service provider for a test account and create a Java Docker container with the predefined memory limits - for example, 8 cloudlets, that equals to 1GB RAM.
Jump to the Jelastic SSH gate (1), select the earlier created test environment (2), and choose the container (3). Now we are inside and can check the available memory with free tool (4).
As we can see, the memory limit equals 1GB defined before. Let’s check the top tool.
Everything works properly. And for double check, we repeat Matt’s test related to the Java heuristic behaviour issue described in his article.
As expected, we get MaxHeapSize = 268435546 (~ 256MiB), that equals to 1/4 of the RAM on the container according to default behaviour of JVM heap.
What is the secret of our solution? Of course, it is a right combination of “ingredients”. In our case, this is a mix of OpenVZ and Docker technologies that gives more control in terms of security and isolation as well as let us play with desired features like containers live migration and containers hibernation. Below you can see a high level scheme of a Docker container in Jelastic.
In OpenVZ, every container has a virtualized view of /proc pseudo-filesystem. In particular, /proc/meminfo inside container is a "bespoke" version, showing per-container information, not the one from the host. So when such tools as top and free run inside a container, they show RAM and swap usage with the limits specific to this particular container.
It is worth noting that swap inside containers is not a real swap, but a virtual one (thus the whole technology is called VSwap). The main idea is that once a container with enabled VSwap goes over its configured RAM limit, some of its memory goes into so called swap cache. No real swap out occurs, meaning there is no unnecessary I/O, unless there's a global (not per-container) RAM shortage. Also, the container that uses VSwap is penalized for overusing RAM by slowing down a bit, so from the inside of the container it feels like that the real swap is happening. This technology results in control over per-container memory and swap usage.
Such implementation allows to run Java and other runtimes with no need to adapt applications for Jelastic PaaS. But if you are not using Jelastic, a possible workaround will be to always specify heap sizes for JVM and don’t depend on heuristics, according to Matt’s tips. For the rest of languages a deeper research is required. Please let us know if you can share your expertise in this direction and we will gladly extend the article.