Editorial: Why Centralised Storage Refuses to Go Away

It is over 30 years since EMC introduced the first Integrated Cache Disk Array to the enterprise data centre market, signalling the transition to centralised or SAN storage. Despite the best efforts of hyper-convergence, the demand for centralised storage remains strong. Even as the public cloud gains significant traction, SAN storage refuses to go away.

Background

In 1990, EMC Corporation (acquired by Dell in 2015/2016) introduced the first ICDA or Integrated Cache Disk Array. The system differed from the storage that had been deployed up to that point as it included software and memory-protected DRAM to abstract and optimise the reading of I/O to and from (relatively) cheap media.

Only three years earlier, the seminal work by Patterson, Katz and Gibson had standardised definitions for RAID, or Redundant Array of Inexpensive Disks (now Independent Disks). Although the first Symmetrix systems from EMC only supported RAID-1 mirroring, the use of system memory as a cache allowed features such as snapshots and replication to be introduced into future models. RAID was expanded to include more efficient RAID-5 and, of course, eventually, RAID-6 for greater protection against large-disk failures.

SAN

The early Symmetrix and similar systems from Hitachi, IBM and others were SCSI-connected. Towards the late 1990s and early 2000s, Fibre Channel (and then iSCSI over Ethernet) introduced the concept of the Storage Area Network or SAN. SANs provide five main features.

Centralisation. Where physical servers were initially deployed with individual disks, increasing drive capacities resulted in significant wastage per server. Centralisation of capacity enabled less wastage as capacity was shared between connected servers.
Resiliency. A single SAN storage “array” has multiple layers of redundancy, which, combined with fault-tolerant storage networking, provides high-availability storage. A typical SAN storage system has multiple power supplies, controllers, write-protected DRAM and multiple network connections. Modern systems offer six or seven “9’s” of availability (99.9999% or 99.99999%, respectively), with some vendors guaranteeing 100% data availability with replication.
Performance. When SAN systems were deployed with HDD media, each drive or “spindle” offered only a few hundred totally random read IOPS. The performance is limited by the rotational speed of the drive and seek time to move the read head into position. On average, the best I/O response is half the time of a single-track rotation (4.2ms for a 7200RPM drive, or about 240 IOPS). With caching, sequential writes and prefetch, SAN systems could achieve much higher IOPS and throughput rates.
Reduced Maintenance. When storage was deployed in hundreds or perhaps thousands of individual servers, each one could represent a failure domain, with daily data centre visits to replace drives. SAN storage centralises that work, reducing double-disk failures (a second failure before the first is replaced) by implementing hot spare drives and proactive sparing. Daily data centre replacements could be scheduled as weekly (or longer) activities, reducing the risk and impact of drive replacement.
Efficiency. Although not an initial design feature, most SAN storage today offers thin provisioning, data deduplication and compression. Together with features such as zero-block detection and TRIM/UNMAP, shared storage systems typically quote an “effective” capacity of around 3-5 times that of the raw storage deployed.

It’s clear that SAN storage offered significant operational benefits in the data centre. With the restriction of distance within a single machine room practically eliminated by optical Fibre Channel, SAN storage grew to support hundreds of systems on a single appliance.

Virtualisation

From the early 2000s, server virtualisation gained widespread adoption, pioneered by VMware, KVM/QEMU and Xen. VMware, especially, was the biggest winner, dominating the market, before being acquired by Broadcom in 2023.

We might assume that the abstraction of physical servers into software would have negated some of the SAN storage benefits. In all type-1 hypervisors, what previously was presented as a LUN or logical volume became a file. However, SANs continued in popularity, as they provided centralised storage between multiple servers running virtualised workloads. If, for example, a virtual machine needed to move between physical hypervisors, the transition could be achieved almost instantly with powered-off servers (a metadata update) or by moving memory and metadata for powered-on systems.

SAN storage continued to offer centralisation of data protection features such as synchronous replication and snapshots, offloaded from the host and virtual machine guests.

HCI

Around 2010, we started to see the emergence of hyper-converged computing solutions. In this architecture, a cluster of type-1 hypervisors implements shared storage across each node or server in the cluster, with resiliency provided by data replicas between nodes.

The first pioneers of the technology were Nutanix (with the Nutanix File System), Simplivity (with a hardware-accelerated solution) and Scale Computing. In the first two examples, storage was implemented within a virtual machine running on each node, whereas the Scale Computing design used hypervisor-based redundant storage called SCRIBE. Some years later, VMware introduced Virtual SAN or vSAN into the vSphere ecosystem, providing hyper-converged functionality.

Hyper-converged storage offers some advantages over centralised storage. Generally, the physical increments are smaller, as most storage systems (at the time HCI was introduced) required increments of an entire RAID set. There was no requirement for storage networking or the (inaccurately perceived) overhead of storage administrators with SAN knowledge.

However, HCI also has drawbacks. Initial solutions used simple mirroring to protect data, creating significant wastage. HCI data recovery requires lots of cross-server network traffic that, in the case of early vSAN, would bring systems to a halt. Early systems designs also required symmetric configurations, which would mean deploying storage into each node in a cluster with equal drive configurations (a costly expansion).

Although hyper-converged solutions have a place, they are still perceived as mid-range options, due the inherent scaling limits which result in the deployment of many clusters to meet large enterprise requirements (and the subsequent fragmentation of free resources).

Public Cloud

The adoption of cloud computing has been prominent across the enterprise for the past decade to fifteen years. The first offerings focused on storage (Amazon Web Services S3), followed by virtual computing instances and persistent storage (EC2 and EBS, respectively).

The abstraction and obfuscation of the design of all public cloud solutions means that, as far as the end user is concerned, persistent storage has returned to a design that sits somewhere between SAN storage and the DAS (directly attached storage) model that predates the SAN era.

From the user perspective, there are no obvious savings from data deduplication, thin provisioning and compression (although the platform vendor is certainly implementing those techniques). Until recently, cloud block storage didn’t offer multi-attach and was dedicated to a single virtual instance.

Possibly the most significant difference between on-premises storage and the public cloud is the implementation of quality of service (QoS). Most SAN storage has no QoS or applies basic metrics based on IOPS and throughput. In the public cloud, IOPS and throughput are chargeable items – if you need more bandwidth, you must pay for it. The inevitable consequence of this is oversubscription, driven by the inability to apply unused performance from one volume or system to another.

The dual risks of under-subscription and oversubscription represent a serious challenge for enterprise administrators. If a system is under-configured with resources, there may be outages required to add additional resources. The temptation is to over-configure and deploy more resources than initially needed, to mitigate any downtime. Unfortunately, in the public cloud, every GB of storage is chargeable, even if it isn’t used. Solutions such as Zesty have been created to solve the optimisation process.

SANs in the Cloud

Several vendors, including Volumez, Lightbits Labs and Silk have developed solutions to resolve the issue of public cloud storage oversubscription. In the case of Volumez, for example, this SaaS-based platform aggregates the resources of storage-intensive virtual instances, typically using ephemeral NVMe SSDs, into a fault-tolerant and resilient data plane.

In one respect, these solutions could be seen as “virtual SANs” in the public cloud. However, they’re delivering more than just media aggregation. The most sophisticated solutions offer granular quality of service, predictable I/O performance (including latency) and minimal management overhead, all determined by policy. Importantly, they also offer features not available in the public cloud, such as thin provisioning and data deduplication.

Looking at SANs from the past, we can see where this technology will evolve. Future iterations will introduce snapshots (probably integrated with local object storage) and remote replication. This feature could provide portability between clouds and on-premises solutions, as well as fixing the problem of cross-region resiliency.

AI

In the mid-2020s, we are entering the AI era. Generative AI has reinvigorated the capabilities that can be achieved with AI and large-language models (LLMs). Whatever form of AI we choose to implement, all of them require vast amounts of data with high-performance random-access requirements.

Once again, centralised storage has the opportunity to shine, acting as the golden repository for training and inferencing data. Modern SANs can now provide storage into multi-petabyte capacity levels, with very high levels of resiliency and availability (typically the six or seven “9”s discussed earlier).

Ransomware

Finally, there’s one unexpected benefit from centralised storage, and that’s as the last line of defence against ransomware. The open nature of modern IT systems means ransomware is a constant risk for all businesses. Software bugs, poorly configured systems and social engineering exploits all represent potential exposures that could result in widespread encryption of system and application data.

Shared storage provides a defensive posture based on discovery and recovery. Discovery techniques can use point-in-time data, such as snapshots, to scan for embedded malware. Data rates of change can highlight mass encryption of content and be the first indicator that an attack is in progress. The recovery capability is delivered with immutable snapshots that can (in the best systems) roll back storage to a known good point in time, typically within seconds or minutes.

Ransomware protection alone could be a justification for using shared storage.

Looking to the Future

Despite the negative press that seeks to make SAN storage look expensive and difficult to manage, the reality is that modern SAN storage is very different from that deployed one or two decades ago.

Almost all modern systems are all-flash solutions, with some hybrid options on the market. Systems now scale into multi-petabyte capacities, with millions or tens of millions of IOPS of performance. In-place non-disruptive upgrades make complex migrations a thing of the past, while the leading vendors have added SaaS management and monitoring portals to their roster.

Purchasing models have changed, with most vendors offering consumption-based pricing and storage-as-a-service. Systems don’t need proactive performance management are generally self-healing. Finally, the addition of APIs provides new ways to manage provisioning and decommissioning that take the human out of the process.

Over the past three decades, centralised storage has grown in popularity and failed to be usurped by alternative solutions. We believe that the future for SAN storage continues to look strong, even as businesses move to hybrid and multi-cloud computing models.

To mis-quote Mark Twain, the rumours of the death of SAN storage are greatly exaggerated. It may turn out that SAN storage remains one of the most resilient computing architectures of the IT era.