We Can Health Check It for You Wholesale – How to do a Full VMware vSphere Healthcheck!

Introduction

I was going to create a simple checklist with loads of check boxes for arming the savvy but forgetful Technical Consultant with a full arsenal of checks to throw at a VMware infrastructure, then, I thought I’d do something a bit more eccentric!

Image: Arnie from the original Total Recall based on Philip K. Dick’s “We Can Remember It for You Wholesale”

Things we will check for you!

1. Is your networking design to best practice?

With access to your network switches, routers, monitoring tools, topology diagrams, and – where possible – server room and/or datacentre access; we will do a thorough analysis of your networking infrastructure against best practice recommendations for support of your SAN and VMware vSphere infrastructure. This will include some if not all, and is not limited to:

Inventory/audit for make and model, switches with Non-Blocking Backplane design, switching backbone with sufficient bandwidth (2 times Gbps speed of utilized ports for full duplex traffic), network patching, standard of network cabling, support for and use of Inter-Switch Linking (ISL) or dedicated Stacking (recommended) architecture, support for and use of Link Aggregation Groups (LAG), configuration of ports as Trunk ports or Access ports, support for and use of Flow Control (802.3x) on all ports, support for Rapid Spanning Tree Protocol (R-STP), support for and correct use of Jumbo Frames, switches with adequate Buffer Space per switch port (at least 512KB per port), no iSCSI enable on PowerConnect 54xx switches in a SAN of more than one iSCSI arrays, PortFast configured on STP Edge Ports, where STP is acting, Flow Control configured (essential with 10GbE), sufficient packet buffering from 10GbE to 1GbE ports, Speed and Duplex settings hard coded, Storm-Control disabled for iSCSI, MTU set correctly for Jumbo Frames (9000 or 9014) and/or packet fragmentation prevention across all devices (standard MTU 1500), use of VLANs to correctly segregate traffic, requirements for/ use of QoS, resilience of network, goal to minimize number of switches, use of private iSCSI network where appropriate, analysis of switch logs to include packet loss/drop, switch firmware, switch software, support on Hardware Compatibility List (HCL) and Software Compatibility List (SCL), manufacturers best practices, research for known issues, …

2. Is your SAN design to best practice?

With access to your SAN Management consoles, fabric switches, monitoring tools, topology diagrams, and – where possible – server room and/or datacentre access; we will do a thorough analysis of your SAN against known VMware vSpher best practices. This will include some if not all, and is not limited to:

Inventory/audit for make and model, check SAN cabling, provision for dual-power supplies, RAID configuration optimized to support the hosted applications, array firmware (not mixed in groups or clusters), disk firmware, resilience of SAN design, utilisation (sufficient free space – i.e. smaller of 5% or 100GB free on EqualLogic), distribution of load across SAN, disk health and availability of hot-spares, volume access (no read or write access where should not be permitted), volume naming conventions and matching with hosted datastores, initiators being used, flow-control settings, Jumbo Frames settings, utilization of available front-end interfaces (to the network), utilization of available back-end interfaces (to disk shelves), verify connections running at full bandwidth and duplex, check for load-balancing completion, check for RAID build completion, check of inter-switch link congestion, check for management connectivity, check storage latency, use of enhancing software packages/features (examples: SAN HeadQuarters, Multi-pathing Extension Module, Host Integration Tools, AutoSnapshot Manager, VAAI plugins, vSphere vCenter integration), configuration of alerts, logs analysis, support on Hardware Compatibility List (HCL) and Software Compatibility List (SCL), manufacturers best practices, research for known issues, …

3. Is your VMware design to best practice?

With access to your vSphere vCenter and vSphere Client, SSH access to hosts, monitoring tools, topology diagrams, and – where possible – server room and/or datacentre access; we will do a thorough analysis of your VMware vSphere implementation against known best practices. This will include some if not all, and is not limited to:

Inventory/audit for make and model, use of hardware or software HBAs, multi-pathing configuration and load-balancing, VMware Host configuration, NTP configuration, host BIOS settings including unnecessary devices disabled, virtual networking configuration (vSphere Standard Switch and Distributed Switch), resilience to NIC and other component failure, VMware licensing and use of paid for features, storage and adapter queues, storage and adapter latency, cluster configuration (using FQDNs, resources to satisfy HA, DRS settings and rules), datastore capacities and free-space, distribution of VMs on datastores, datastore block size, iSCSI/FC HBA timeout settings for seamless controller failover, obtaining expected storage throughput, use of enhancing software packages/ features/ plug-ins (examples: VAAI plugins, HP Offline Bundle, Dell HIT), configuration of alerts and monitoring, vCenter and Host logs analysis, firmware and software versions, support on Hardware Compatibility List (HCL) and Software Compatibility List (SCL), manufacturers best practices, research for known issues, …

4. Are your Virtual Machines configured to best practice?

With access to your vSphere vCenter; we will do a thorough analysis of your VMware virtual machines against known best practices. This will include some if not all, and is not limited to:

Presence of old or large snapshots and VCB garbage, virtual machine hardware up to date, VMware Tools up to date, CD-ROMs and unnecessary devices disconnected/removed, CPU ready too high, over allocation of vCPUs, VM Swap and Ballooning, guest disk size and free space, thin or thick VMDKs, guest disk-alignment, backup, replication and DR strategy.

Bonus Section 1 - Some Tools We Might Use

vCheck (VMware Analysis)
ESXTOP or RESXTOP (VMware Analysis)
Crystal Disk Mark (Storage Analysis)
VMware VMmark 2.x (Storage Analysis)
VMware I/O Analyzer (Storage Analysis)

Bonus Section 2 - Some Links You Might Want to See

VMware Best Practice Guide:

Gabesvirtualworld.com:

Storage:

Networking:

Final Comment

I will try and keep this updated with new stuff as and when. Thank you for reading!

Comments