Sunday, 26 December 2010

Our ESXi hosts are running out of memory - oh no they're not!

Scenario: 

ESXi 4.1 host with 60GB memory, 17 X Windows 7 guests with 2GB memory, 3 x Windows 2008 guests with 4GB memory, and 1 x Windows 2008 guest with 8GB memory; and the host reports a “Host memory usage” warning – how come VMware's superb memory management systems have not kicked in?



Answer:

In the scenario above; adding up the memory given to the guests, this comes to 54GB, and when looking at the host memory usage, it was recording nearly 56GB memory usage (57344MB.)


Any VMware veteran would look at this situation and think “How odd, how come the TPS (Transparent Page Sharing) has not kicked in!”

The image below shows host memory claimed by the guests and configured memory size.


To see that TPS is not working, quickest way is to click on a guest in the vSphere client, and go to the resource allocation tab. In the image below notice there is no shared memory (below was for one of the Windows 7 guests)


To see shared memory in action, one would expect to something more like the example in the below image where there is 1.64GB shared, or greater than 50% (this was for a Windows 7 guest configured with 3GB memory.)


What is going on?

The answer is found in Matt Liebowitz's excellent article - VMware KB Clarifies Page Sharing on Nehalem Processors:


And specifically the paragraph:

VMware has published a KB article that gives more information on TPS with Nehalem processors and why it appears TPS isn’t working (this affects modern AMD processors also). The short version is that TPS uses small pages (4K), and Nehalem processors utilize large pages (2MB). The ESX/ESXi host keeps track of what pages could be shared, and once memory is over-comitted it breaks the large pages into small pages and begins sharing memory.”

And yes, the host in question had a Nehalem processor (easy to check in Wikipedia - http://en.wikipedia.org/wiki/Nehalem_(microarchitecture) )

There is a solution if this behaviour is proving to be unsettling (will not say fix as technically everything is fine.)

You can force the use of small pages on all guests all the time by changing the value of the advanced option Mem.AllocGuestLargePage to 0 on your hosts and then VMotion the VMs off and back on to the host, or cold boot them.

In the scenario above; with TPS in effect, very roughly around 50% of the memory consumed by guests would be reclaimed, and the 60GB host would be showing only around 30GB memory usage.


THE END


A bit of further reading semi-related to the above from Matt Liebowitz's blog - Does ASLR really hurt memory sharing in VMware vSphere?


And the reg key to disable ASLR (Address Space Layout Randomization):

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management]
MoveImages”=dword:00000000


Tuesday, 21 December 2010

Manually Provisioning 50 Windows 7 Virtual Desktops

Scenario:

A company has purchased Citrix XenDesktop 4 Standard Edition, and someone is given the task to manually provision 50 Windows 7 virtual desktops (their license did not include the excellent Citrix Provisioning Services) on a vSphere infrastructure

Of course this was not an optimal solution, would have been very easy to provision 50 desktops in no time at all with Citrix Provisioning Services, alas the $125+ per user saving was desired....


Solution:

If anyone is considering/tasked with doing this, the main thing you will be interested in knowing is how long it will take. The answer is that you could get it down to just over

2 minutes per desktop

(basically as long as it takes to type out a few characters, do a few clicks of the mouse, login, activate and reboot)

The procedure is very simple:

Part 1:

Create your gold image and turn it into a template

Part 2:

Create a customization specification for your Windows 7 machine

Part 3:

Repeat this procedure in Stage 1 below for each desktop your want to create, then when they have finished going through the sysprep process (sysprep is inbuilt into the Windows 7 O/S and it is this that the vCenter customization uses) and are powered up and on the login prompt; complete stages 2 and 3.

Stage 1:

Right-click Gold-Image template and choose “Deploy Virtual Machine from this Template”
Give it a “Name:” , and choose an “Inventory Location”
Next >
Choose on which host or cluster you want to place the virtual machine
Next >
Choose which resource pool you want to run the the virtual machine within
Next >
Choose a datastore in which to store the virtual machine files
Next >
Choose disk format
Next >
Choose “Customize using an existing customization specification”
Next >
Tick “Power on this virtual machine after creation”
Finish

Note: Each virtual desktop actually takes around 30 minutes to complete stage 1; the machine boots and runs through sysprep using the vCenter customization wizard. It will reboot a few times and automatically be renamed and domain joined (if the customization specification is correctly configured)

Stage 2:

Put in the correct OU for XenDesktops
Login to the desktop
Activate Windows
Reboot

Stage 3:

Add machines to the Citrix Desktop Delivery Controller and assign users

Friday, 10 December 2010

Nexenta Community 3.0.3 / 3.0.4 Web UI Stops Working!

Scenario:

Running five Nexenta (Community Version 3.03 and 3.04) storage boxes (installed on reclaimed HP DL380 G4's,) and on all five the Web UI has stopped working.

Below follows a fix to get the Web UI working again, additional commands, and two examples


Fix:

Part 1/2

Use putty or similar to SSH to the Nexenta box and use the root login

From the UNIX shell (#) run these commands:

1 root@nexentabox:/volumes# svcadm enable -rs apache2
2 root@nexentabox:/volumes# svcadm restart nmv
3 root@nexentabox:/volumes# svcadm restart nms
4 root@nexentabox:/volumes#

Note 1: Command from line 1 only needs to be run one time
Note 2: These commands are perfectly safe to run during the working day

If in the NMC shell ($), to get to the UNIX shell (#) run these commands:

nmc@nexentabox:/$ option expert_mode=1 -s
nmc@nexentabox:/$ !bash
You are about to enter the Unix ("raw") shell and execute low-level Unix command(s). Warning: using low-level Unix commands is not recommended! Execute? Yes
root@nexentabox:/volumes#


Part 2/2



A promising fix (courtesy of my colleague Alfredo) to stop the Web UI failing in the future (or at least reduce the rate of it happening) is to change the 'Seconds between Retrieves' time on the Status → General → General Status and Details pane to 100 or more (default is 5)


Note 3: The HP DL380 G4s used are not on the HCL for OpenSolaris - http://www.sun.com/bigadmin/hcl/data/os/ - apart from problems with the Web UI, Nexenta runs better than the old Openfiler installs it has replaced, with a much greater feature set


Some other commands used in Nexenta

UNIX shell (#)

Reboot # shutdown -y -i6 -g0
Reboot (older command that still works)  # sync; sync; init 6
Shutdown # shutdown -y -i5 -g0
Shutdown (older command that still works) # sync; sync; init 5

NMC shell ($)

$ setup appliance upgrade nms (to upgrade Web UI)
$ setup appliance upgrade (to upgrade base OS s/ware in Nexenta Community Edition)
$ setup appliance init (re-run through the network setup)


Example 1 where SSH enters into the NMC shell

login as: root
Using keyboard-interactive authentication.
Password:
Last login: Thu Dec 9 07:00:13 2010 from 172.23.123.234
*****************************************************************
* Management Console *
* Version 3.0.3-4 *
* *
* press TAB-TAB to list and complete available options *
* *
* type help for help *
* exit to exit local NMC, remote NMC, or group mode *
* q[uit] or Ctrl-C exit NMC dialogs *
* q[uit] or Ctrl-C exit NMC text viewer *
* *
* option -h help on NMC options *
* -h help on any command *
* ? brief summary *
* help keyword [-q] locate NMC commands *
* help -k [-q] same as above *
* setup usage combined 'setup' man pages *
* show usage combined 'show' man pages *
* *
* type help and press TAB-TAB *
* *
* Management GUI: https://10.11.12.13:2000/ *
* *
*****************************************************************
nmc@flake:/$ option expert_mode=1 -s

nmc@flake:/$ !bash
You are about to enter the Unix ("raw") shell and execute low-level Unix command (s). Warning: using low-level Unix commands is not recommended! Execute? Yes

root@flake:/volumes# svcadm enable -rs apache2
root@flake:/volumes# svcadm restart nmv
root@flake:/volumes# svcadm restart nms
root@flake:/volumes#


Example 2: where SSH connection appears to be unresponsive after logging in - wait a few minutes and the SYSTEM NOTICE “Failed to initialize NMC” pops up and the prompt enters the UNIX shell

login as: root
Using keyboard-interactive authentication.
Password:
Last login: Fri Dec 10 08:30:42 2010 from 172.23.117.160

* * *
SYSTEM NOTICE

Failed to initialize NMC:
no introspection data available for method 'get_props' in object '/Root/App liance', and object is not cast to any interface

Suggested possible recovery actions:
- Reboot into a known working system checkpoint
- Run 'svcadm clear nms'; then try to re-login
Suggested troubleshooting actions:
- Run 'svcs -vx' and collect output for further analysis
- Run 'dmesg' and look for error messages
- View "/var/log/nms.log" for error messages
- View "/var/svc/log/application-nms:default.log" for error messages

Entering UNIX shell. Type 'exit' to go back to NMC login...
root@ripple:~# svcadm enable -rs apache2
root@ripple:~# svcadm restart nmv
root@ripple:~# svcadm restart nms
root@ripple:~#

Thursday, 9 December 2010

Replacing failed disk on Compaq MA8000

A bit of a blast from the past this!

The Compaq MA8000 went end of life around 2004, there are still some out there though, and a few CLI commands must be run before pulling the failed disk


Procedure:

Part 1: Establish CLI access

Method A

Use the serial cable provided with the MA8000 (Serial Cable Part # 17-04074-01) to connect from the serial port of a system running the HyperTerminal application, to the HSG80 controller port. HyperTerminal settings:

Baud Rate = 9600
Data Bits = 8
Parity = NONE
Stop Bits = 1
Flow Control = Hardware

Method B

Use the Storage Works Commands Console installed on an NT system and open the CLI Window (you will need to know an authorization password)


Part 2: Disk removal

From the CLI:

Check that the failed disk is part of the failedset. The failedset contains disk drives that were removed from service either by the controller or by the user.

HSG80> SHOW FAILEDSET

Enter the DELETE FAILEDSET and DELETE DISKnnnnn commands before physically removing failed members from the storage shelf for testing, repair, or replacement.

HSG80> DELETE FAILEDSET DISKnnnnn
HSG80> DELETE DISKnnnnn

Then replace the failed disk.


Part 3: Add replacement disk

From the CLI:

HSG80> RUN CONFIG

Add the new disk drive to the spareset.
The spareset is a pool of drives available to the controller to replace failing members of storagesets.

HSG80> ADD SPARESET DISKnnnnn

If the raidset that the failed disk device was part of is running in reduced state, then the controller automatically removes the new disk from the spareset and adds it to the raidset.
If the controller had a spareset when the disk failed, then the controller already have added a disk to the raidset and the state is normal.


Note i: HSG80 is the Array Controller
Note ii: In DISKnnnnn the nnnnn corresponds relates to the SCSI channel and target ID of the disk (example: DISK50400 is in Channel 5 and Target ID 4)


Credits:


Monday, 6 December 2010

Setting up Syslogging with VMware vCenter, Free Kiwi Syslog Server, and ESXi

Part 1: Download, install Kiwi Syslog Server on the Virtual Center server


i: On the Virtual Center server, download and install Kiwi Syslog which is currently freely available from:  http://www.solarwinds.com/products/freetools/kiwi_syslog_server/
ii: Extract files from the zip and then run the setup.exe

iii: Agree to the End User License Agreement
iv: Choose to 'Install Kiwi Syslog Server as a Service' and click Next


v: Accept the default -> Install the Service using: The LocalSystem Account, and click Next
vi: Untick 'Install Kiwi Syslog Web Access' (feature not availble in free version,) and click Next
vii: Choose Components - can leave on type = Normal, and click Next
viii: Choose Install Location and click Install
ix: Run Kiwi Syslog Server when the install completes, and click Finish



Part 2: Configure Kiwi Syslog Server

It will work fine with default settings, one thing we might want to do:

From File -> Setup -> Rules -> Default -> Actions -> Log to file

Change the default log file path and file name
Would also be nice to 'Enable Log File Rotation' alas this feature requires the licensed version

Note i: default location is C:\Program Files (x86)\Syslogd\Logs\SyslogCatchAll.txt
Note ii: default UDP listen port of 514 is used
Note iii: The paid for version of Kiwi Syslog Server costs £215 and would be worth buying for the extra features


Part 3: Enable syslog on the ESXi hosts

i: Via the vSphere client - click on an ESXi host and select Configuration tab -> Advanced Settings (under Software)
ii: From Advanced Settings window - in Syslog -> Syslog.Remote.Hostname, enter the DNS name of your Virtual Center Server and click OK


iii: Verify messages are being received and if this is okay then enable for all your ESXi hosts