Thursday, 23 May 2013

Research Links & Info - 24th May 2013

*Credits to CG, ES, SH, JS, AN

Here with a focus on specific questions pertaining to a mostly RedHat and NFS on NetApp environment, with a little VMware!

## Contents ##

- NetApp Links
- NFS Best Practices - Linux Mount Options
- NFS Best Practices - More Linux Mount Options
- Some Random Questions

## NetApp Links ##

*Credit to CG

NetApp Thin-Provisioned LUNs on RHEL 6.2 Deployment Guide
*Rishikesh Boddu, Martin George - July 2012

Red Hat Enterprise Linux 6, KVM, and NetApp Storage: Best Practices Guide for Clustered Data ONTAP
*Joe Benedict - November 2012

## NFS Best Practices - Linux Mount Options ##

*Credit to JS, CG

mount -t nfs -o rw,bg,hard,intr,rsize=65536,wsize=65536,vers=3,proto=tcp,timeo=600 NFS_SERVER_IP_ADDRESS:/PATH /CLIENT_PATH
Note: The vers and proto will vary in syntax from OS to OS.

Why?
bg - so if you have a problem at boot you don't sit there forever
hard - to prevent loss of data if the system crashes
intr - to be able to ctrl-c out of an error (Oracle doesn’t like this, unlikely to see a problem though)
64K rsize/wsize - for max efficiency
timeo=600 - because it’s a reasonable value and probably doesn’t matter anyway because the main retry stuff is handled at the TCP layer

Exceptions?
noac/actimeo=0 - use if you must have multiple servers with a single consistent image (depending on the OS)
llock - if you have Solaris; this will probably help performance if you have a lot of high IO
ro - use if you want read-only

TCP slot tables should be 128, 16 will hurt you bad. Also, there was a bug in many Linux OS’s that prevented the setting from taking effect when placed in /etc/sysctl.conf. They may need to edit /etc/init.d/netfs to call /sbin/sysctl -p in the first line of the script so that sunrpc.tcp_slot_table_entries is set before NFS mounts any file systems (if NFS mounts the file systems before this parameter is set, the default value of 16 will be in force).” “UDP slot tables is irrelevant.”

## NFS Best Practices - More Linux Mount Options ##

*Credit to AN, CG

enable “forcedirectio” - if the application maintains an internal cache (e.g. databases)
Why? Under certain circumstances, performance may be enhanced by purposely setting up the NFS client to not cache any data. When this option is used on a Solaris client, system memory is not used for file-system data and instead only the Oracle buffer cache is used. If data is not in the buffer cache, it is fetched from the storage system.
Note: Not all databases or all instances of the same database benefit from disabling the client cache. The performance impact is deployment specific.

enable “llock” - if the application performs file locking in a single-host environment
Why? Unfortunately, some NFS clients take a brute-force approach to maintaining coherency of locked data. Specifically, on some platforms, locking a file or data region results in all data associated with the file being invalidated from cache and all operations are then “over the wire,” resulting in higher I/O latencies. Depending on the application requirements, llock helps to take advantage of file-system caching to improve performance.

Disable WCC (Weak Cache Consistency) - if the workload contains significant write traffic in a single-host environment, test for the weak cache consistency issue.
Why? The NFSv3 protocol allows for weak cache consistency. Basically, the protocol has attributes associated with each file that contain timestamps and file size from the last access. The theory is that two instances can read share the file with no problem. The moment that one instance writes to the file, other instances see the attributes change (for example, last modified time) and invalidates any cached copies of the file - that's how it was designed to work.
Note: NFSv4 no longer has the concept of WCC for file modifying operations.

set nfs:nfs3_bsize=65536
Why? This command controls the logical block size used by the NFSv3 client. Usually nfs:nfs3_max_transfer_size is  tuned to have the same value as nfs:nfs3_bsize. Do not set this parameter too high because it might cause the system to hang while waiting for memory allocations to be granted.

set nfs:nfs3_max_threads=64
Why? This command controls the number of kernel threads that perform asynchronous I/O for the NFSv3 client. Placing a value for this command is dependent on the network bandwidth (for a very low-bandwidth network, you might want to decrease this value so that the NFS client does not overload the network.)

set nfs:nfs3_nra=? (value depends on the network bandwidth)
Why? This command controls the number of read-ahead operations that are queued by the NFSv3 client when sequential access to a file is discovered.  You can increase or reduce the number of read-ahead requests that are outstanding for a specific file at any given time.
- For a very low-bandwidth network or on a low-memory client, you might want to decrease this value so that the NFS client does not overload the network or the system memory.
- If the network is very high bandwidth and the client and server have sufficient resources, you might want to increase this value.

/dev/tcp tcp_recv_hiwat
Why? Determines the maximum value of the TCP receive buffer. This is the amount of buffer space allocated for TCP receive data. The default value is 49152 in Solaris 10 (recommend setting it to 65535.)

/dev/tcp tcp_xmit_hiwat
Why? Determines the maximum value of the TCP transmit buffer. This is the amount of buffer space allocated for TCP transmit data. The default value is 49152 in Solaris 10 (recommend setting it to 65535.)

## Some Random Questions ##

*Credit to ES, SH, CG

Q1) When a SATA disk reaches 90% capacity, does it suffer a serious slow down?

No! The slowdown at 90% is more to do with WAFL than the disk and it also depends on where the data on the disk is located and the manner in which it is being accessed – sequential or random and if read or write. Same applies to SAS as well as SATA. It’s layout rather than utilisation, although a full disk “may” mean a poorer layout.

Q2) With RedHat 5.8, what are the recommended volume options?

Depends! The obvious one is the security style (UNIX).  If being used for NFS exports there are some NFS tuning settings for performance (see above.) Generally language settings - allowing UTF-8 support with vol lang volname en_US.UTF-8 - are particularly important along with vol options volname convert_ucode on and vol options volname create_ucode on. Requires more information on what the volume is being used for to make a decision on what to set.

Q3) With RedHat and NetApp, what’s the recommendation for flow control?

With all OS’s: on NetApp switch and host, where possible set flow control to none. With some HBAs, CNAs or NICs, this is not something that can be set - in this case ensure that flow control is set to none on all parts of the infrastructure that support it.

Q4) With RedHat they notice a benefit by mounting the NFS volumes with different IPs to get around serialization that happens when using just one IP for the NFS server (NetApp), would VMware similarly benefit from NFS volumes being mounted on different IPs?

Maybe! There have been reported benefits from using such configurations. Whether it will be a benefit or not will depend on the environment.

Q5) How to detect pause frames?

ifstat –a

Q6) On a FAS32XX, can see the c0a and c0b, but on all of them c0a is disabled, is this normal behaviour?

Yes! Normally one cluster connection is passive and the other is active.  If c0b fails then c0a should takeover and become active.  Double check with a “cf monitor” command output.

Q7) Is it possible to transfer an aggregate with all the volumes and data inside it, to another controller by simply un-assigning the disks and then re-assigning the disks?

Yes! The disks will be seen as a “foreign” aggregate which will be taken offline once the disks are assigned to the new controller.  You will need to bring that aggregate online to make it and the volumes inside it active (a good idea to rename the aggregate too.)

See: How to move an aggregate between software disk-owned HA pairs

For an aggregate with FlexVols:

Method 1:
FILER1> priv set diag
FILER1*> aggr offline aggrToMove -a
FILER1*> priv set admin
FILER1> aggr assign XXX YYY ZZZ -o FILER2 -f
FILER1> aggr status -r aggrToMove
FILER2> aggr online aggrMoved

Method 2:
To un-own a disk (don’t worry about the errors regards failed aggregate on the source):
FILER1> disk assign XXX -s unowned –f
To assign a disk:
FILER2> disk assign XXX
Then destroy the aggregate on the source!
FILER1> aggr offline aggrOldLocation
FILER2> aggr destroy aggrOldLocation

Q8) Is there a head swap/upgrade process for say a FAS2240-4 to FAS3250?


Research Links & Info - 23rd May 2013

With stuff from/on - Bitpushr’s Blog, Microsoft Exchange, NetApp - Cloud Services, NetApp - HA Failover Configuration (7-Mode), NetApp - Performance Analysis, NetApp - Volume Migrations (in 7-Mode), Networking, Powershell, VMware View, VMware vSphere.

## Bitpushr’s Blog ##

First time I’d come across this blog (please more updates.) Some interesting articles:

A useful link to the Cisco Nexus 5010 12-Node Cluster Switch config:

Extract - converting from 8.1 7-Mode into C-Mode:
set-defaults
setenv bootarg.init.boot_clustered true
setenv bootarg.init.usebootp false
setenv bootarg.bsdportname e0a

## Microsoft Exchange ##

From technet.microsoft.com:

From blogs.technet.com:

From communities.netapp.com:

## NetApp - Cloud Services (such as offsite backup/DR) ##

A few links below to NetApp partners/customers:

## NetApp - HA Failover Configuration (7-Mode) ##


Example working config for filer A:
ifgrp create lacp Internal_VIF e0a e0b
vlan create Internal_VIF 20
ifgrp create lacp Vmware_VIF
ifconfig Internal_VIF 192.168.1.197 netmask 255.255.255.0 partner Internal_VIF mtusize 1500 trusted wins up nfo
ifconfig Internal_VIF-20 192.168.2.11 netmask 255.255.255.0 partner Internal_VIF-20 mtusize 1500 trusted wins up nfo
ifconfig Vmware_VIF 192.168.10.11 netmask 255.255.255.0 partner Vmware_VIF mtusize 1500 trusted wins up nfo

Example working config for filer B:
ifgrp create lacp Internal_VIF e0a e0b
vlan create Internal_VIF 20
ifgrp create lacp Vmware_VIF
ifconfig Internal_VIF 192.168.1.196 netmask 255.255.255.0 partner Internal_VIF mtusize 1500 trusted wins up nfo
ifconfig Internal_VIF-20 192.168.20.10 netmask 255.255.255.0 partner Internal_VIF-20 mtusize 1500 trusted wins up nfo
ifconfig Vmware_VIF 192.168.10.10 netmask 255.255.255.0 partner Vmware_VIF mtusize 1500 trusted wins up nfo

options cf.takeover.on_network_interface_failure on
options cf.takeover.on_network_interface_failure.policy any_nic

options timed.enable on
options timed.proto ntp
options timed.servers pool.ntp.org 

And more via the link!

## NetApp - Performance Analysis ##

From communities.netapp.com
CMD> plink.exe FILERNAME
CMD> perfstat -f FILERNAME -t 3 -i 6 -l root -S pw:password1 > CASENUMBER.FILERNAME.PERFSTAT.OUT
-t 3: This defines the delay between each scan
-i 6: How many iterations (scans) do you want to run on the filer)

From communities.netapp.com and kris.boeckx@pidpa:
Commands to troubleshoot performance issues:
## Start with:
priv set diag
# For detailed CPU statistics and how to identify the cause of high CPU
sysstat -M -i 5
# Shows you the different I/O ("Disk util" is very important, and also "CPty" see Manual Pages - sysstat )
sysstat -x 5
# Shows you the read / write / latency's of luns
lun stats -i 5
# Shows you detailed info of every lun (you will want to capture this in an output file)
stats show lun
# Same as lun but now for the volumes (you will want to capture this in an output file)
stats show volume
# Shows if any reallocation jobs are running (walf scan status shows you even more info)
reallocate status
 ## For even more info!
# Will start the data collection (wait a few minutes)
statit -b
# Will stop the collection and will give you the result.
statit -e

From NetworkAdminKB:
stats show
reallocate measure -o /vol/fragmentedvolume/lun
reallocate status
reallocate start -f -p /vol/volume_name/lun
man reallocate

## NetApp - Volume Migrations (in 7-Mode) ##

Ways to move a volume to another aggregate (of course, in Clustered ONTAP this is very easy):
ndmpcopy - Copy the entire volume from one location to another. Create the new volume first.
vol copy - Similar to NDMPCOPY, but the destination volume needs to be restricted.
snapmirror - This is nice as you can setup a schedule and keep the updates going until you are ready to do the final copy.
host based - Use something like robocopy or rsync to copy the data from one volume to another.
vol move (DOT 8.x) - Unless it is a snapmirror or snapvault target.

## Networking ##

From longwhiteclouds.com (Long White Virtual Clouds):

From bladesmadesimple.com (Blades Made Simple):

## Powershell ##


## VMware View ##

From blogs.vmware.com:

## VMware vSphere ##

From viktorious.nl:
esxtop -b -d 10 -n 360 > esxtopresults.csv
- or -
esxtop -b -d 10 -n 360 | gzip > esxtopresults.csv.gz
Then add as a new source in permon!

From viktorious.nl:
Consider if there is a very big read or very big write that is going on…

Saturday, 18 May 2013

How to Update the Disk Firmware in Clustered Data ONTAP 8.1.X

Note: When you perform a Data ONTAP software upgrade, disk firmware is included with the Data ONTAP upgrade package. It is sometimes preferred to apply disk firmware before a Data ONTAP upgrade because may have to wait (usually no more than 30 minutes) for disk firmwares to apply before the newly applied Data ONTAP image loads. Verify the recommended procedure with AutoSupport > Upgrade Advisor.

From http://support.netapp.com
Downloads > Firmware > Disk Drive & Firmware Matrix
Click on the link to download ‘all current Disk Firmware - updated 11-APR-2013’
Click the Download .zip button at the bottom of the page to download:
all.zip

Add the all.zip file to your web server.

Image: all.zip presented in HTTP File Server

From the CDOT CLI:
Change to the advanced privilege level:

set -priv adv

To download the firmware:

system firmware download -node nodename -package http://web_server/all.zip

And that’s it!

The disk(s) will be automatically updated if the background disk firmware update option is enabled (as by default):

storage disk option modify -node nodename -bkg-firmware-update on

To manually update disk firmware:

storage disk updatefirmware

To verify disk firmware versions:

storage disk show -fields firmware-revision

How to Update the Disk Qualification Package in Clustered Data ONTAP 8.1.X

From http://support.netapp.com
Downloads > Firmware > Disk Drive & Firmware Matrix
Click on the link to download ‘Disk Qualification Package’ (Updated 11-APR-2013)
Click the Download .zip button at the bottom of the page to download:
qual_devices.zip

Add the qual_devices.zip file to your web server.

Image: qual_devices.zip presented in HTTP File Server

Part 1: Check if the Disk Qualification Package is required

Note: The current version of qual_devices_v3 is noted by datecode 20130228 (check on the download page.)

To determine if it is required run:

node run -node nodename -command rdfile /etc/qual_devices_v3

If the datecode is older than the latest, or there is ‘no such file or directory’ then the latest Disk Qualification Package is required.

Part 2: Installing the Disk Qualification Package

Select the advanced privilege level:

set -priv advanced

To download the disk qualification package:

storage firmware download -node nodename -package http://web_server/qual_devices.zip

And that’s it!
Wait 5 minutes or so for the system to process newly installed disk qualification packages.

TIP: You can use nodename of * (wildcard) to download the disk qualification package to all nodes!

How to Update the Disk Shelf Firmware in Clustered Data ONTAP 8.1.X

Note: When you perform a Data ONTAP software upgrade, disk shelf firmware is included with the Data ONTAP upgrade package. Verify the recommended procedure with AutoSupport > Upgrade Advisor.

From http://support.netapp.com
Downloads > Firmware > Disk Shelf Firmware
Here we’ll select ‘Download all current Disk Shelf & I/O Module Firmware’.
Click the Download .zip button at the bottom of the page to download:
all_shelf_fw.zip

Add the all_shelf_fw.zip file to your web server.

Image: all_shelf_fw.zip presented in HTTP File Server

From the CDOT CLI:
Change to the advanced privilege level:

set -priv adv

To download the firmware:

system firmware download -package http://web_server/all_shelf_fw.zip -node nodename

To manually update the disk shelf firmware without rebooting:

system node run -node nodename -command storage download shelf

To manually update the ACPP firmware without rebooting:

system node run -node nodename -command storage download acp

APPENDIX: Alternative Method

The alternative method uses a scp/sftp host, and very briefly:

set -priv adv
security login show -username diag
security login unlock -username diag
security login password -username diag
systemshell -node nodename

Login with username = diag, and password as set above.

scp user@: /mroot/etc
unzip /mroot/etc/all_shelf_fw.zip -d /mroot/etc
exit

Note: If you do a cd /mroot/etc/shelf_fw and ls from here, you can see all the current disk shelf firmware packages. This is a good way of checking if shelf firmwares are required!

Then the same as the HTTP method

system node run -node nodename -command storage download shelf
system node run -node nodename -command storage download acp