Sunday, 26 December 2010

Our ESXi hosts are running out of memory - oh no they're not!

Scenario: 

ESXi 4.1 host with 60GB memory, 17 X Windows 7 guests with 2GB memory, 3 x Windows 2008 guests with 4GB memory, and 1 x Windows 2008 guest with 8GB memory; and the host reports a “Host memory usage” warning – how come VMware's superb memory management systems have not kicked in?



Answer:

In the scenario above; adding up the memory given to the guests, this comes to 54GB, and when looking at the host memory usage, it was recording nearly 56GB memory usage (57344MB.)


Any VMware veteran would look at this situation and think “How odd, how come the TPS (Transparent Page Sharing) has not kicked in!”

The image below shows host memory claimed by the guests and configured memory size.


To see that TPS is not working, quickest way is to click on a guest in the vSphere client, and go to the resource allocation tab. In the image below notice there is no shared memory (below was for one of the Windows 7 guests)


To see shared memory in action, one would expect to something more like the example in the below image where there is 1.64GB shared, or greater than 50% (this was for a Windows 7 guest configured with 3GB memory.)


What is going on?

The answer is found in Matt Liebowitz's excellent article - VMware KB Clarifies Page Sharing on Nehalem Processors:


And specifically the paragraph:

VMware has published a KB article that gives more information on TPS with Nehalem processors and why it appears TPS isn’t working (this affects modern AMD processors also). The short version is that TPS uses small pages (4K), and Nehalem processors utilize large pages (2MB). The ESX/ESXi host keeps track of what pages could be shared, and once memory is over-comitted it breaks the large pages into small pages and begins sharing memory.”

And yes, the host in question had a Nehalem processor (easy to check in Wikipedia - http://en.wikipedia.org/wiki/Nehalem_(microarchitecture) )

There is a solution if this behaviour is proving to be unsettling (will not say fix as technically everything is fine.)

You can force the use of small pages on all guests all the time by changing the value of the advanced option Mem.AllocGuestLargePage to 0 on your hosts and then VMotion the VMs off and back on to the host, or cold boot them.

In the scenario above; with TPS in effect, very roughly around 50% of the memory consumed by guests would be reclaimed, and the 60GB host would be showing only around 30GB memory usage.


THE END


A bit of further reading semi-related to the above from Matt Liebowitz's blog - Does ASLR really hurt memory sharing in VMware vSphere?


And the reg key to disable ASLR (Address Space Layout Randomization):

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management]
MoveImages”=dword:00000000


1 comment:

  1. That's a fast way to check on the shared memory. Thanks for explaining this.
    http://jtrader.hubpages.com/hub/Modern-and-Ancient-Metal-Working-Techniques-Cloisonne-Decorative-Metal-Working-Technique-and-More

    ReplyDelete