Research Links & Info - 24th May 2013

*Credits to CG, ES, SH, JS, AN

Here with a focus on specific questions pertaining to a mostly RedHat and NFS on NetApp environment, with a little VMware!

## Contents ##

- NetApp Links
- NFS Best Practices - Linux Mount Options
- NFS Best Practices - More Linux Mount Options
- Some Random Questions

## NetApp Links ##

*Credit to CG

NetApp Thin-Provisioned LUNs on RHEL 6.2 Deployment Guide
*Rishikesh Boddu, Martin George - July 2012

Red Hat Enterprise Linux 6, KVM, and NetApp Storage: Best Practices Guide for Clustered Data ONTAP
*Joe Benedict - November 2012

## NFS Best Practices - Linux Mount Options ##

*Credit to JS, CG

mount -t nfs -o rw,bg,hard,intr,rsize=65536,wsize=65536,vers=3,proto=tcp,timeo=600 NFS_SERVER_IP_ADDRESS:/PATH /CLIENT_PATH
Note: The vers and proto will vary in syntax from OS to OS.

bg - so if you have a problem at boot you don't sit there forever
hard - to prevent loss of data if the system crashes
intr - to be able to ctrl-c out of an error (Oracle doesn’t like this, unlikely to see a problem though)
64K rsize/wsize - for max efficiency
timeo=600 - because it’s a reasonable value and probably doesn’t matter anyway because the main retry stuff is handled at the TCP layer

noac/actimeo=0 - use if you must have multiple servers with a single consistent image (depending on the OS)
llock - if you have Solaris; this will probably help performance if you have a lot of high IO
ro - use if you want read-only

TCP slot tables should be 128, 16 will hurt you bad. Also, there was a bug in many Linux OS’s that prevented the setting from taking effect when placed in /etc/sysctl.conf. They may need to edit /etc/init.d/netfs to call /sbin/sysctl -p in the first line of the script so that sunrpc.tcp_slot_table_entries is set before NFS mounts any file systems (if NFS mounts the file systems before this parameter is set, the default value of 16 will be in force).” “UDP slot tables is irrelevant.”

## NFS Best Practices - More Linux Mount Options ##

*Credit to AN, CG

enable “forcedirectio” - if the application maintains an internal cache (e.g. databases)
Why? Under certain circumstances, performance may be enhanced by purposely setting up the NFS client to not cache any data. When this option is used on a Solaris client, system memory is not used for file-system data and instead only the Oracle buffer cache is used. If data is not in the buffer cache, it is fetched from the storage system.
Note: Not all databases or all instances of the same database benefit from disabling the client cache. The performance impact is deployment specific.

enable “llock” - if the application performs file locking in a single-host environment
Why? Unfortunately, some NFS clients take a brute-force approach to maintaining coherency of locked data. Specifically, on some platforms, locking a file or data region results in all data associated with the file being invalidated from cache and all operations are then “over the wire,” resulting in higher I/O latencies. Depending on the application requirements, llock helps to take advantage of file-system caching to improve performance.

Disable WCC (Weak Cache Consistency) - if the workload contains significant write traffic in a single-host environment, test for the weak cache consistency issue.
Why? The NFSv3 protocol allows for weak cache consistency. Basically, the protocol has attributes associated with each file that contain timestamps and file size from the last access. The theory is that two instances can read share the file with no problem. The moment that one instance writes to the file, other instances see the attributes change (for example, last modified time) and invalidates any cached copies of the file - that's how it was designed to work.
Note: NFSv4 no longer has the concept of WCC for file modifying operations.

set nfs:nfs3_bsize=65536
Why? This command controls the logical block size used by the NFSv3 client. Usually nfs:nfs3_max_transfer_size is  tuned to have the same value as nfs:nfs3_bsize. Do not set this parameter too high because it might cause the system to hang while waiting for memory allocations to be granted.

set nfs:nfs3_max_threads=64
Why? This command controls the number of kernel threads that perform asynchronous I/O for the NFSv3 client. Placing a value for this command is dependent on the network bandwidth (for a very low-bandwidth network, you might want to decrease this value so that the NFS client does not overload the network.)

set nfs:nfs3_nra=? (value depends on the network bandwidth)
Why? This command controls the number of read-ahead operations that are queued by the NFSv3 client when sequential access to a file is discovered.  You can increase or reduce the number of read-ahead requests that are outstanding for a specific file at any given time.
- For a very low-bandwidth network or on a low-memory client, you might want to decrease this value so that the NFS client does not overload the network or the system memory.
- If the network is very high bandwidth and the client and server have sufficient resources, you might want to increase this value.

/dev/tcp tcp_recv_hiwat
Why? Determines the maximum value of the TCP receive buffer. This is the amount of buffer space allocated for TCP receive data. The default value is 49152 in Solaris 10 (recommend setting it to 65535.)

/dev/tcp tcp_xmit_hiwat
Why? Determines the maximum value of the TCP transmit buffer. This is the amount of buffer space allocated for TCP transmit data. The default value is 49152 in Solaris 10 (recommend setting it to 65535.)

## Some Random Questions ##

*Credit to ES, SH, CG

Q1) When a SATA disk reaches 90% capacity, does it suffer a serious slow down?

No! The slowdown at 90% is more to do with WAFL than the disk and it also depends on where the data on the disk is located and the manner in which it is being accessed – sequential or random and if read or write. Same applies to SAS as well as SATA. It’s layout rather than utilisation, although a full disk “may” mean a poorer layout.

Q2) With RedHat 5.8, what are the recommended volume options?

Depends! The obvious one is the security style (UNIX).  If being used for NFS exports there are some NFS tuning settings for performance (see above.) Generally language settings - allowing UTF-8 support with vol lang volname en_US.UTF-8 - are particularly important along with vol options volname convert_ucode on and vol options volname create_ucode on. Requires more information on what the volume is being used for to make a decision on what to set.

Q3) With RedHat and NetApp, what’s the recommendation for flow control?

With all OS’s: on NetApp switch and host, where possible set flow control to none. With some HBAs, CNAs or NICs, this is not something that can be set - in this case ensure that flow control is set to none on all parts of the infrastructure that support it.

Q4) With RedHat they notice a benefit by mounting the NFS volumes with different IPs to get around serialization that happens when using just one IP for the NFS server (NetApp), would VMware similarly benefit from NFS volumes being mounted on different IPs?

Maybe! There have been reported benefits from using such configurations. Whether it will be a benefit or not will depend on the environment.

Q5) How to detect pause frames?

ifstat –a

Q6) On a FAS32XX, can see the c0a and c0b, but on all of them c0a is disabled, is this normal behaviour?

Yes! Normally one cluster connection is passive and the other is active.  If c0b fails then c0a should takeover and become active.  Double check with a “cf monitor” command output.

Q7) Is it possible to transfer an aggregate with all the volumes and data inside it, to another controller by simply un-assigning the disks and then re-assigning the disks?

Yes! The disks will be seen as a “foreign” aggregate which will be taken offline once the disks are assigned to the new controller.  You will need to bring that aggregate online to make it and the volumes inside it active (a good idea to rename the aggregate too.)

See: How to move an aggregate between software disk-owned HA pairs

For an aggregate with FlexVols:

Method 1:
FILER1> priv set diag
FILER1*> aggr offline aggrToMove -a
FILER1*> priv set admin
FILER1> aggr assign XXX YYY ZZZ -o FILER2 -f
FILER1> aggr status -r aggrToMove
FILER2> aggr online aggrMoved

Method 2:
To un-own a disk (don’t worry about the errors regards failed aggregate on the source):
FILER1> disk assign XXX -s unowned –f
To assign a disk:
FILER2> disk assign XXX
Then destroy the aggregate on the source!
FILER1> aggr offline aggrOldLocation
FILER2> aggr destroy aggrOldLocation

Q8) Is there a head swap/upgrade process for say a FAS2240-4 to FAS3250?