First, some good
links:
Introduction
As DFS namespaces scale in size, managing DR in an
efficient manner is a matter of interest. One idea - if you’ve got a large
namespace - is to use Active/Active DFS, or two paths, one to the primary site
(with read-write data), one to the secondary site (with read-only data), both
targets enabled, but the DR path is effectively down so it should just use the
active primary one!
Image: Example of a
DFS Namespace and a target with active/active (enabled/enabled) targets
In the above image, PRICLU1V1 is the Enabled(UP) path,
and SECCLU1V1 is the Enabled(DOWN) path. We can control the UP/DOWN on a NetApp
Clustered ONTAP system by simply running the command::>
net
int modify -vserver VSERVERNAME -lif LIFNAME -status-admin up/down
But Does it
Work?
Not quite as we’d like. The first time you connect to one
of the folders in the DFS namespace, there is a so-many-second timeout before
the folder share opens (if you’ve not been referred to the up one). This delay
is easy to demonstrate by creating and running a batch file like the below:
ECHO
%TIME%
net
use * \\lab.priv\NASTEST\TEST1
ECHO
%TIME%
net
use * /delete /YES
ECHO
%TIME%
net
use * \\lab.priv\NASTEST\TEST1
ECHO
%TIME%
net
use * /delete /YES
REM
Repeat the above as many times as required!
PAUSE
As an example in a test lab:
C:\Users\Administrator\Desktop>ECHO
13:03:05.23
13:03:05.23
C:\Users\Administrator\Desktop>net
use * \\lab.priv\NASTEST\TEST2
Drive
Z: is now connected to \\lab.priv\NASTEST\TEST2.
The
command completed successfully.
C:\Users\Administrator\Desktop>ECHO
13:03:26.44
13:03:26.44
C:\Users\Administrator\Desktop>net
use * /delete /YES
C:\Users\Administrator\Desktop>ECHO
13:03:26.46
13:03:26.46
C:\Users\Administrator\Desktop>net
use * \\lab.priv\NASTEST\TEST2
Drive
Z: is now connected to \\lab.priv\NASTEST\TEST2.
The
command completed successfully.
C:\Users\Administrator\Desktop>ECHO
13:03:26.47
13:03:26.47
Notice the first time we connect it took over 20 seconds!
The next time 0.01 of a second!
Why?
“... there was
still a link in the namespace to a server that was down, so the long pause when
opening DFS was because it was searching for that server and failing.”
And remember, this is for every DFS Folder/Link, if the
link hasn’t been already cached (and you’re referred to the wrong one). Even
though once you’re connected it is no problem, this delay can impact login
times, and if the drive gets disconnected, the wait for re-connect will contribute
to a poor end-user experience, which is unacceptable!
What can we do?
DFS Properties
in DFS Management
What options are there in DFS Management around this?
Namespace Referrals
Settings
Image: Namespace
Referrals Settings
Image: Namespace
Referrals Ordering Method
Folder Referrals
Settings
Image: Folder
Referrals Settings
Folder Target
Referrals Settings
Image: Folder
Target Referrals Settings
Note: The DFS
Server in the images and examples above is a Windows Server 2008R2 DFS box!
How to Fix the
Problem - Part 1?
After going to the effort of creating subnets and sites
in Active Directory Sites and Services; setting the site link costs to favour
primary (most things are in the default site); and configuring referral
ordering; the behaviour was more predictable. Still there was a ~20 second
timeout when connecting to a link after failover (but not every link.) The behaviour
when primary was up, never had delay. And, when the DFS targets were cached
(for 1800 seconds) and even beyond that time, once connected (either in
failover or not), the behaviour was consistently quick to connect.
Remember “The DFSN
client connectivity design isn’t for instant failover; it’s for geographical
high availability and closest targeting. If you need instant failover,
clustering is the way to go.”
Still, why the 20 second timeout, can we not reduce/fix
it?
Image: DFS Folder
Targets (no longer using “Default-First-Site-Name”)
Image: Site Link
Costs
Image: Folder
Target Override Referral Ordering for Primary
Image: Folder
Target Override Referral Ordering for Secondary
How to Fix the
Problem - Part 2?
A very useful tool for troubleshooting DFS referrals is
dfsutil. And for this topic, the commands:
dfsutil /pktinfo
dfsutil /pktflush
One thing that had popped up a few times whilst
researching this is:
And “DFSDnsConfig
registry key must be added to each server that will participate in the DFS
namespace for all computers to understand fully qualified names ... which
includes the DFS Namespace Servers and the Domain Controllers.”
So, following the instructions which are essentially:
1) Run the following on all DCs and DFS servers:
Dfsutil.exe
server registry dfsdnsconfig set
2) Restart DFS on all the DCs and DFS servers:
Net stop dfs; Net
start dfs
3) Recreate the namespace and create all folder
targets with FQDNs:
Image: Folder
Targets with FQDN path
... still had the 20 second timeouts! (Perhaps something
was missed...)
To be continued...
PS Another idea - if active/active DFS is desired, but we
want the secondary referred first in a
failover, update the site cost to make it more attractive (and referred first)?
Comments
Post a Comment