Editorial: Hyperscalar Storage – Build, Buy or Acquire?

Editorial: Hyperscalar Storage – Build, Buy or Acquire?

Chris EvansAll-Flash Storage, Cloud, Cloud Practice, Cloud Storage, Cloud-Native, Data Practice: Data Storage, Editorial, Enterprise, Pure Storage, Storage, Storage Hardware, Storage Media

In a conference call discussing their latest financial results, Pure Storage CEO Charlie Giancarlo once again highlighted the company’s intention to sign a hyperscalar customer by the end of the year.  Exactly what could this mean for the company, the industry, and for customers deploying applications in the public cloud?

Background

Following the announcement of Q2 FY2025 financial results, Pure Storage CEO Charlie Giancarlo once again stated his expectation of signing a hyperscaler partner by the end of the year.  Specifically, his comment was:

Our discussions with hyperscalers to replace their core storage with Pure technology continue to progress positively. Our lead prospect has advanced from extensive evaluation of our core technology, to testing an integrated solution, and we have been engaged in detailed contractual negotiations for many months. We remain confident that we will secure our first hyperscaler design win by year-end.

Does this mean we could see Pure Storage arrays “natively” available in the public cloud, or is the implementation more nuanced than that?  Let’s look back at the history of how storage in the cloud has been deployed and see how this translates to the current news.

Build

In the early stages of the public cloud, platform providers undoubtedly created bespoke storage solutions.  In 2006, the idea of S3 (AWS Simple Storage Service) was relatively new to the market (although object storage already existed), so Amazon created the platform and the de-facto S3 API standard as a piece of core intellectual property (IP).  As this recent article (link) discusses, EBS, the Elastic Block Store, another AWS core storage feature, started out with hard disk drives and has been required to evolve and deliver greater resiliency, higher throughput and improved availability over time. 

A lot of engineering goes into storage for the cloud, both through hardware amendments (like Nitro and Azure Boost) and in software.  The public cloud has a unique set of requirements, such as its self-service/on-demand nature, with a high churn rate for volumes.  It also needs to record and manage highly granular usage statistics for subsequent customer billing.

Many of the core features of storage on-premises are shared with the public cloud, including the obvious resiliency and performance capabilities.  However, there are also data optimisation requirements (not always passed on to the customer) and data protection through snapshots and other techniques.

Acquire

The equivalence of feature requirements in enterprise and cloud storage is why we’ve seen many acquisitions of storage start-ups by the public cloud service providers.  In 2019, we authored a post that discussed many of these IP and “acqui-hires” in the industry.  Notably, E8 Storage was acquired by Amazon, Elastifile by Google Cloud, StorSimple and Avere Systems by Microsoft, Ondat by Akamai and Cleversafe by IBM. 

We believe E8 Storage intellectual property helped develop io2 EBS storage, while Elastifile was used to improve Google Filestore.  Google also uses Intel DAOS for Parallelstore; IBM used Cleversafe to deliver object storage.

Buy

There is also a third option.  NetApp, for example, has been successful with integrating ONTAP directly into the public cloud, including AWS, Azure, and Google Cloud implementations.  This deployment of ONTAP is “cloud-native” in the sense that it can be managed and operated by cloud APIs and isn’t simply storage software running in a virtual instance.  To date, NetApp is the only storage company that has successfully implemented its product natively in the public cloud, although AWS does support alternative FSx file-system solutions including Windows Server, Lustre and OpenZFS.


SANs in the Cloud 2024 Edition – Pathfinder

This Architecting IT report reviews the emerging market for software-defined storage solutions deployed as virtual storage area networks in public clouds. It explains what features and functionality to expect, with a review of vendors in the market. Premium download for subscribers. (BRKWP0306-2024)


Mobility

Why does it matter if solutions are internally developed or OEM’d from an existing storage vendor?  The most obvious reason is data mobility.  Computing environments rarely operate in isolation but typically share data across applications and platforms.  The easiest way to move persistent data safely and successfully between disparate environments is at the storage layer.  In turn, the best way to achieve this process is to use the native replication capability of the storage platform.

Using NetApp as an example, data can be replicated into the public cloud using SnapMirror, which is a highly efficient block-based replication tool for NetApp ONTAP storage volumes.  SnapMirror keeps track of updates through internal metadata, making it simple and efficient to move data around with the minimum of overhead.  Non-native solutions, in contrast, generally need to replicate entire volumes, which can be wasteful on bandwidth and expensive in network charges. 

Pure Storage

How does all this relate to the news that Pure Storage is working with a major hyperscaler?  Looking at the wording of the announcement from the conference transcript, Giancarlo specifically suggests the target hyperscaler is looking to replace its “core storage” with a Pure Storage solution.  This could mean FlashBlade (file) or FlashArray (block).  We also are assuming that by “hyperscaler”, Giancarlo means a platform like AWS, Azure or Google Cloud, but this could mean a company such as Meta, where Pure Storage already has a relationship to deliver storage solutions. 

Let’s assume the target hyperscaler is a cloud platform, which might be a tier 1 vendor like those we’ve mentioned but could also be a tier 2 provider such as Oracle Cloud, Akamai, Vultr, OVH or DigitalOcean.  We believe the target CSP is unlikely to be AWS but could be any of the others we’ve mentioned.  Microsoft Azure is a possible candidate because Cloud Block Store is already supported on the platform, particularly with the Azure VMware Solution.

Competition

What benefit could there be in deploying Pure Storage FlashBlade or FlashArray into a CSP?  Cloud Service Providers are focused on cost and operational efficiency.  This is why we’ve generally not seen traditional storage solutions used in the public cloud, but instead a focus on internally designed solutions.  However, the Purity operating system (which runs both FlashBlade and FlashArray) has a unique approach to flash management, abstracting the Flash Translation Layer (FTL) away from media controllers and into software.

FTL abstraction could provide CSPs with a massive opportunity to reduce costs as the volume of flash storage in the public cloud continues to grow.  We’re on the cusp of a new era of 120+TB capacity SSDs from the leading storage media vendors, but they come with reduced capabilities, such as poor IOPS numbers.  FTL abstraction mitigates this problem and enables much more efficient data resiliency capabilities (something KIOXIA is looking to solve with RAID offload technology).  For more information on the implications of FTL, see the following blogs and podcasts.

For Pure Storage, the obvious benefit is either hardware sales or software licensing to a CSP.  But it also could enable data mobility in and out of the public cloud platform, providing justification for customers to also buy Pure Storage solutions on-premises.

Implementation

Of course, a deal with Pure Storage doesn’t mean shipping in thousands of FlashArray systems (although that could be an initial deployment strategy).  Instead, we believe it is more likely that a CSP will license the capabilities of Purity and work to build customised hardware that best serves requirements.  The critical intellectual property within FlashBlade and FlashArray is Purity, combined with custom Direct Flash Modules (DFMs).  The capacity requirements of a hyperscaler could justify the development of bespoke hardware, such as custom chassis or rack-scale storage deployments.

Alternatives

Naturally, we are guessing at what might be in the pipeline for Pure Storage.  What alternatives do the hyperscalers have if buying from existing storage vendors is not a desired route?

Although 120+TB drives are imminent, the technology is not without challenges.  As we’ve highlighted in the following graphs, I/O density for large-capacity SSDs (in this case Western Digital) declines rapidly as capacity scales.  This means over time it won’t be an option (for example) to simply replace two SSDs with one of double the capacity and see the same level of performance. 

With economies of scale available, one option for CSPs is to build custom SSDs in the same way Pure Storage has done.  Amazon was probably the first company to take the custom hardware route, acquiring Annapurna Labs in 2015 and using the company’s IP to develop the Nitro offload system.  Amazon (or any CSP) could acquire a storage controller vendor such as Phison, Marvell, FADU or Silicon Motion and build an ecosystem similar to that developed with Purity (although it would take time).

Microsoft acquired Fungible in January 2023, while other companies such as Kalray, Pliops, GRAID, and ScaleFlux could be possible targets.  For more on these companies, see our “Intelligent Data Devices” report embedded here.


Intelligent Data Devices 2023 Edition – A Pathfinder Report

This Architecting IT report looks at the developing market of SmartNICs, DPUs and computational storage devices, as data centres disaggregate data management processes, security and networking. Subscriber download only. (BRKWP0303-2023)


The Architect’s View®

If Pure Storage signs a major hyperscaler and replaces its core storage solutions with either FlashBlade or FlashArray, then this will undoubtedly be a coup for the company and represent a significant revenue boost.  However, the public cloud service providers have choices, the economies of scale, and the deep pockets to acquire and develop custom storage solutions. 

Since the start of the public cloud market in the mid-2000s, all vendors have strived to improve services and reduce costs.  Economies of scale have seen a transition away from using enterprise solutions to custom builds and the development of bespoke solutions for computing (Arm instances), networking and storage (traffic and management offload). 

The next phase in the optimisation cycle will include storage media.  We believe that the Pure Storage model of system design is unique and provides a first phase of the transition away from commodity SSDs.  What comes next could be either the acquisition of Pure Storage (which we view as unlikely) or CSPs branching out into custom storage silicon in a strategy that parallels the development of Arm-based SoCs for virtual instances.

Only time will tell if we are right, with the first step along this journey expected by the end of the year, when we hope Pure Storage will have some positive news.


Copyright (c) 2007-2024 – Post #2e2d – Brookend Ltd, first published on https://www.architecting.it/blog, do not reproduce without permission. Pure Storage is a Tracked Vendor by Architecting IT in storage systems and software-defined storage.