One of the problems with Synchronously Mirrored SANs (Multi-Site SAN, Syncrep, Metrocluster, you name it) is that they introduce an amount of write latency, and this latency increases the further away you separate nodes of the Sync-SAN.
Why does it increase latency?
Well, the whole point of a Sync-SAN, is that the data that exists on the SAN in Site B, is an exact copy of the data that exists on the SAN in Site A, and to have an exact copy it means any writes must be written to both SANs at the same time and acknowledgement of successful writes from both SANs returned before the server can submit any further writes.
A better explanation (from the Compellent Enterprise Manager 6.1 Administrator’s Guide):
“Synchronous replication makes sure that a write is successfully written to the remote system before returning a Successful Completion command to the server IO request. The Storage Center does not acknowledge completion of the write-back to the server until both the write IO to the local volume and the IO sent to the remote system are complete.”
What is a best case minimum latency?
This universe is pretty much governed by the rule that nothing can go faster than the speed of light (or at least accelerate through the speed of light barrier – no reason why things can’t travel faster than the speed of light – like Tachyons – but those things never slow down.)
The speed of light is 299’792’458 m/s, or ~= 299’792 km/s.
Say we have two sites separated by a distance of 100km, this gives a round trip of 200km, and a best case minimum latency of 0.7ms (0.7 milliseconds or 0.0007 s.) Now 0.7 ms is a pretty tiny interval and perfectly acceptable in an age where even the most demanding applications might ask for 2ms maximum latency; but the 0.7 ms is a best case and additional latency is added by the physical network infrastructure in between nodes of the Sync-SAN, and the physical hardware of the SAN and Server nodes themselves.
What is the maximum distance with a 2 ms maximum latency in a perfect world?
We can use a best case scenario to consider how far we can stretch our Sync-SAN in a perfect world where no latency is introduced by storage, server or networking.
Max. Distance in km = (0.002 seconds * speed_of_light_in_km/s) / 2
What if we had servers in a different site in between the SANs?
Theoretically, we could stretch the distance between SANs to 600km and still keep the 2ms maximum latency if we had a Site A with servers in, Site B 300km away from Site A with SAN Mirror A, and Site C 300km away from Site B with SAN Mirror B. Now, this would be a pretty bananas solution having servers in one site, and SANs in other sites, but theoretically since the servers need to wait for successful packet delivery and return acknowledgement to both SANs in a Sync-SAN, this is a valid idea.
Image: Stretching a Sync-SAN to 600km and keeping 2 ms maximum latency for server writes
What about going beyond 600km and keeping 2ms maximum latency?
Here’s where this post goes even more bananas (more like stark raving bonkers – but hey it’s the holiday season so I can allow the indulgence) and delves into the realms of science-fiction.
Now, suppose we could create a wormhole in between the Sync-SANs, then there would be no real-world distance limitations. We could fire the write packets down the wormhole, have some kind of delay mechanism on the other end (because the packets would have travelled backwards in time), deliver the packet to the Sync-SAN replication node at the correct time, then fire the acknowledgement packet back down the wormhole, delay it (because it would again have gone back in time), and deliver it to the server with minimal to no latency. The packet would never have travelled faster than the speed of light through the wormhole, but – when measured between points outside the wormhole – it would indeed have exceeded the speed of light.