Home | GestaltIT | Enterprise Computing: Why Thin Provisioning Is Not The Holy Grail for Utilisation
Thin Provisioning (Dynamic Provisioning, Virtual Provisioning, or whatever you prefer to call it) is being heavily touted as a method of reducing storage costs. Whilst at the outset it seems to provide some significant storage savings, it isn't the answer for all our storage ills.

Enterprise Computing: Why Thin Provisioning Is Not The Holy Grail for Utilisation

0 Flares Twitter 0 Facebook 0 Google+ 0 StumbleUpon 0 Buffer 0 LinkedIn 0 0 Flares ×
Thin Provisioning (Dynamic Provisioning, Virtual Provisioning, or whatever you prefer to call it) is being heavily touted as a method of reducing storage costs.  Whilst at the outset it seems to provide some significant storage savings, it isn’t the answer for all our storage ills.
 
What is it?
 
Thin Provisioning (TP) is a way of reducing storage allocations by virtualising the storage LUN.  Only the sectors of the LUN which have been written to are actually placed on physical disk.  This has the benefit of reducing wastage, in instances where more storage is provisioned to a host than is actually needed.  Look a the following figure.  It shows five typical 10GB LUNs, allocated from an array.  In a “normal” storage configuration, those LUNs would be allocated to a host and configured with a file system.  Invariably, the file systems will never be run at 100% utilisation (just try it!) as this doesn’t work operationally and also because users typically order more storage than they actually require, for a many reasons.  Typically, host volumes can be anywhere from 30-50% utilised and in an environment where the entire LUN is reserved out for the host, this results in a 50-70% wastage.
 
Now, contrast this to a Thin Provisioned model.  Instead of dedicating the physical LUNs to a host, they now form a storage pool; only the data which has actually been written is stored onto disk.  This has two benefits; either the storage pool can be allocated smaller than the theoretical capacity of the now virtual LUNs, or more LUNs can be created from the same size storage pool.  Either way, the physical storage can be used much more efficiently and with much less waste.
There are some obvious negatives to the TP model.  It is possible to over-provision LUNs and as data is written to them, exhaust the shared storage pool.  This is Not A Good Thing and clearly requires additional management techniques to ensure this scenario doesn’t happen and sensible standards for layout and design to ensure a rogue host writing lots of data can’t impact other storage users.
 
The next problem with TP in this representation is the apparent concentration of risk and performance of many virtual LUNs to a smaller number of physical devices.  In my example, the five LUNs have been stored on only three physical LUNs.  This may represent a potential performance bottleneck and consequently vendors have catered for this in their implementations of TP.  Rather than there being large chunks of storage provided from fixed volumes, TP is implemented using smaller blocks (or chunks) which are distributed across all disks in the pool.  The third image visualises this method of allocation.
 
So each vendor’s implementation of TP uses a different block size.  HDS use 42MB on the USP, EMC use 768KB on DMX, IBM allow a variable size from 32KB to 256KB on the SVC and 3Par use blocks of just 16KB.  The reasons for this are many and varied and for legacy hardware are a reflection of the underlying hardware architecture.
Unfortunately, the file systems that are created on thin provisioned LUNs typically don’t have a matching block size structure.  Windows NTFS for example, will use a maximum block size of only 4KB for large disks unless explicitly overriden by the user.  The mismatch between the TP block size and the file system block size causes a major problem as data is created, amended and deleted over time on these systems.  To understand why, we need to examine how file systems are created on disk. 
 
The fourth graphic shows a snapshot from one of the logical drives in my desktop PC.  This volume hasn’t been defragmented for nearly 6 months and consequently many of the files are fragmented and not stored on disk in contiguous blocks.  Fragmentation is seen as a problem for physical disks as the head needs to move about frequently to retrieve fragmented files and that adds a delay to the read and write times to and from the device.  In a SAN environment, fragmentation is less of an issue as the data is typically read and written through cache, negating most of the physical issues of moving disk heads.  However fragmentation and thin provisioning don’t get along very well and here’s why.
 
The Problem of Fragmentation and TP
 
When files are first created on disk, they will occupy contiguous sections of space.  If this data resides on TP LUNs, then a new block will be assigned to a virtual TP LUN as soon as a single filesystem block is created.  For a Windows system using 4KB blocks on USP storage, this means 42MB each time.  This isn’t a problem as the file continues to be expanded, however it is unlikely this file will end neatly on a 42MB boundary.  As more files are created and deleted, each 42MB block will become partially populated with 4KB filesystem blocks, leaving “holes” in the filesystem which represent unused storage.  Over time, a TP LUN will experience storage utilisation “creep” as new blocks are “touched” and therefore written onto physical disk.  Even if data is deleted from an entire 42MB chunk, it won’t be released by the array as data is usually ”logically deleted” by the operating system.  De-fragmenting a volume makes the utilisation creep issue worse; it writes to unused space in order to consolidate files.  Once written, these new areas of physical disk space are never reclaimed. 
 
So what’s the solution?
 
Fixing the TP Problem
  
Making TP useful requires a feature that is already available in the USP arrays as Zero Page Reclaim and 3Par arrays as Thin Built In.  When an entire “empty” TP chunk is detected, it is automatically released by the system (in HDS’s case at the touch of a button).  So, for example as fat LUNs are migrated to thin LUNs, unused space can be released.
This feature doesn’t help however with traditional file systems that don’t overwrite deleted data with binary zeros.  I’d suggest two possibilities to cure this problem:
  • Secure Defrag.  As defragmentation products re-allocate blocks, they should write binary zeros to the released space.  Although this is time consuming, it would ensure deleted space could be reclaimed by the array.
  • Freespace Consolidation. File system free space is usually tracked by maintaining a chain of freespace blocks.  Some defragmentation tools can consolidate this chain.  It would be an easy fix to simply write binary zeros over each block as it is consolidated up.

One alternative solution from Symantec is to use their Volume Manager software, which is now “Thin Aware”.  I’m slightly skeptical about this as a solution as it places requirements on the operating system to deploy software or patches just to make storage operate efficiently.  It takes me back to Iceberg and IXFP….

Summary

So in summary, Thin Provisioning can be a Good Thing, however over time, it will lose its shine.  We need fixes that allow deleted blocks of data to be consolidated and returned to the storage array for re-use.  Then TP will deliver on what it promises.

Footnote

Incidentally, I’m surprised HDS haven’t made more noise about Zero Page Reclaim.  It’s a TP feature that to my knowledge EMC haven’t got on DMX or V-Max.

 

About Chris M Evans

Chris M Evans has worked in the technology industry since 1987, starting as a systems programmer on the IBM mainframe platform, while retaining an interest in storage. After working abroad, he co-founded an Internet-based music distribution company during the .com era, returning to consultancy in the new millennium. In 2009 Chris co-founded Langton Blue Ltd (www.langtonblue.com), a boutique consultancy firm focused on delivering business benefit through efficient technology deployments. Chris writes a popular blog at http://blog.architecting.it, attends many conferences and invitation-only events and can be found providing regular industry contributions through Twitter (@chrismevans) and other social media outlets.
  • Carmen

    One interesting note: You mention that NTFS defaults to 4k blocks. Operationally, I usually recommend that customers format with 16k blocks explictly because that is the best size when using the Volume Shadowcopy Service (VSS) alongside defragging. If you use 4k blocks and defrag the volume, VSS tracks those changes and overextends its quota, resulting in snapshots being pared away too soon. I just thought that was interesting because of the 3Par choice of TP block sizes. I suspect they see better TP performance and utilization over the long term with Windows file servers than the others, in a well-managed environment.

  • bill bloom

    Although I am a solutions consultant for HDS, I am not privy to the algorithmic implementation of HDS products and rely on user manuals just as everyone else relies upon. So with that in mind, my limited knowledge brings me to the conslusion that the wasted space of the 42MB chunk size and Windows I/O size of 4KB doesn’t really have the same waste impact that FAT LUNs have produced. While I agree that it will get worse over time, I do not think it will ever get to the same wastage that DP brings to the table. I like your lively discussion and found it a very interesting read. It is unfortunate that the drawings that you included could not be brought up to a size where they could be read though.

    You are also right in that HDS has not focused on the Zero Data Reclaim feature. This is huge and I do speak to it in certain circles. With management customers, I tend not to discuss this in detail because they usually glaze over (especially in the Federal Government space)!!!

    Thaks for taking the time to write about this topic, it is appreciated!

    Bill Bloom Sends–

  • http://blogs.netapp.com/extensible_netapp kostadis

    Hi!

    the problems you describe are real.

    Personally I think that if you’re not doing 4K block thin provisioning, it’s not thin provisioning but pudgy provisioning.

    http://blogs.netapp.com/extensible_netapp/2009/05/real-thin-provisioning-vs-pudgy-provisioning.html

    As for fragmentation, that’s only a problem if you’re array is not optimized to handle a different disk layout from your client view of the layout.

    http://blogs.netapp.com/extensible_netapp/2009/04/understanding-wafl-performance-the-f-word.html

    For systems like WAFL and other Better Than Real Fibre Channel Arrays, the fragmentation issue is a solved problem.

    http://blogs.netapp.com/extensible_netapp/2009/05/the-importance-of-relaxing-constraints.html

    The real problem, for Real Fibre Channel Arrays aka Traditional Legacy Arrays, is that they are simply not architected to deal with thin provisioning and so try to come up with approximations to the real solution.

    kostados

  • bill bloom

    My apologies…

    I didn’t realize the diagram images were thumbs pointing to the originals. Duh!

    Bill Bloom Sends–

  • Pingback: Welcome to vSphere-land! » Thin Provisioning Links()

  • John Harker

    A friend (not Bill) pointed out your blog entry. It’s a good writeup. I too work for HDS. Thanks for the mention of Zero Page Reclaim, as to why we don’t make more of it… you know we are often known as a stealth marketing company… :-)

    With respect to filesystems, a point of interest you didn’t mention is the differences in filesystem behaviour in reallocating storage over time. Some existing filesystems are ‘thin friendly’ and already favour reuse of discarded space, some march forward, always using new. The first is thin friendly, the second not. Examples of thin friendly filesystem include VxFS, NTFS and ZFS. An unfriendly one – Solaris UFS.

    In moving to use of thin provisioning, our Zero Page Reclaim feature is a good first step with potentially large rewards through moving existing fat volumes to thin and reclaiming unused space. Particularly if that space has never been written to – a common circumstance in many environments. Significant deferrals of storage purchases can be had (found money).

    And in the near future, with regard to Fixing the Problem, the path is as you suggest to a better ongoing operational reclamation process. For example Symantec not only is ‘thin aware’ they have an initiative to provide an interface in VERITAS Storage Foundation suite which can be used to enable communication between the file system and a thin provisioning capable array. This will provide a mechanism to allow the thin aware file system to notify the array which pages on a thinly provisioned LUN or volume do not have valid data (this enables them to more readily implement things like the ideas you suggest). The array can then safely disassociate the page from a thinly provisioned LUN/volume and make it available for re-use.

    Independently the INCITS T10 committee on SCSII storage interfaces has also been including similar ideas into proposals likely to be adopted over time by file systems, databases, and storage vendors.

    Something you did not mention were two other significant benefits of thin provisioning – both storage management related. The first is that that it provides a much more agile way of provisioning storage, in exactly the same way VMware gives you agility in creating new Virtual Machines. To create a new volume can be done from the virtualization pool without having to add storage. The act of storage provisioning to an application, and adding physical storage to the box have been decoupled. Actual physical storage is added asynchronously without affecting applications. The second benefit is the automatic optimization that can happen if the thin provisioning implementation is smart and automatically load balances among all the spindles in the pool.

  • Dale Clutterbuck

    I have recently been toying with HDP and I have found TP very iffy. None of the file systems I have tried were particularly “thin friendly”. TP seems to be best used for things like Oracle, MS SQL and VMWare. Really anything that creates big files that rarely get shifted around.

    Where I have found DP useful is its other features: load smoothing, simpler provisioning, performance. I am still testing but this seems to really suit our VMWare environment.

    -Dale

  • soikki

    Take notice that you can also “win” back capacity on vmware:

    After you have migrated virtual hosts to vmware, do zero page reclaim and there are good chances that you will gain back some capacity which has been formatted by the vmware virtual host (windows)…

    Zero page reclaim with DP really is good…

  • Rob

    Mileage definitely varies dependent on the file systems. Many file systems write meta data through out the LUN causing high allocation of pages even when actual data is minimal.

  • SRJ

    For the sake of conversation, I just wanted to note a few things about IBM’s XIV system, which operates on 1MB chunks…

    Similar to HDS’ Zero Page Reclaim feature, it reclaims emptied space, but via an automated background scrubbing mechanism. Also, I believe it is unique (I could be wrong) that the XIV does not allocate any capacity in the first place, if the filesystem writes a long string of zeroes. HDS, in contrast, would write the zeroes and consume the space until you initiate the Zero Page Reclaim operation. XIV would have never written those zeroes to begin with. This is perhaps a small but interesting difference…

    The XIV also solves two of your other potential objections to thin provisioning – performance and protection from rogue hosts writing lots of data.

    Of course, every volume on an XIV system is striped across every disk in the entire system, thereby eliminating any potential performance impact caused by thin provisioning. XIV also has the concept of Storage Pools, which are nothing more than logical constructs. You can run thin provisioned volumes in one pool, and fat volumes in another pool if you like….and you can ensure that that volumes in one pool are never affected by rogue volumes in another pool. To take it a step further, XIV also has a logical way to deal with the problem of depleted capacity in a pool. Before taking the drastic step of locking all volumes in the pool from any further write activity, it can methodically (based on priority and creation time) start deleting snapshots in order to free up additional space in the pool. Not sure how the other systems mentioned would handle this…?

  • CW

    It seems that the T10/T13 support for the TRIM and Punch commands are intended to address this very issue. When doing a delete, the filesystem can send a list of blocks down to the storage device rather than sending SCSI writes with a bunch of zeros. My understanding is that MS Windows 7 has NTFS support for the TRIM command. This still doesn’t solve the mismatch of filesystem and storage subsystem block sizes.

  • John

    Interesting article, but what is even more interesting is the poll at the top right of the screen. EMC is regarded as being one of the worst when it comes to storage management, I have no clue how it got 37% and 1st place. The only reason I can see how is because the poll wasn’t very fair. The vendors with excellent management such as Sun or Equallogic (now Dell) weren’t even on the list. For those that have used Sun and Dell storage management will know that they are far superior that that of EMC. I would like to see the poll redone with EMC, Dell Equallogic, Sun, and NetApp.

    I know this doesn’t relate to the article but I felt like adding my 10 cents anyway :)

  • Pingback: Enterprise Computing: So EMC, Where’s Your Thin Persistence? « The Storage Architect()

  • Andrew

    Unfortunately the Thin provisioning suppliers can only fix so much of the problem. For a complete solution you need a host based environment which is designed to work with Thin storage and currently the vast majority of host VM/FS products are not designed to work with thin storage.

    Take 0 page reclamation, its effectiveness is limited to the point at which data is migrated into thin. Once the migration is complete 0 page has very little use because filesystems typically do not write blocks full of nulls, this is not for example what happens when a file is deleted.

    In addition any filesystem fragmentation or striping of metadata across volumes will reduce utilization unless the TP system uses very small block sizes.

    With most current filesystems the point of maximum effectiveness from a TP perspective is when the data has been migrated into TP assuming 0 page reclamation or if you have used Symantec Smartmove. After that point it is inevitable that blocks which do not contain nulls but which are not being used by the filesystem will build up, the proportion of these to the used blocks depends on how efficient the FS is at optimizing its layout and the workload.

    inevitably you need a mechanism which can recover blocks which do not contain nulls but which are not used by the filesystem. this is not 0 page and needs to be implemented at the host level with support from the TP system.

  • Rich

    Recent experience with IBM XIV thin provisioning is that it does not work as SRJ explains. The 1MB chunk extent is misleading.

    If you write 1KB of data in an empty XIV volume, 17GB of hard capacity is automatically reserved for that volume, even if it’s thinly provisioned. You can provision by number of blocks on the XIV, which limits what the host can actually see, but chunks of storage are allocated to the volume in 17GB chunks.

    The biggest downside I’ve noticed with XIV thin provisioning is you can’t even allocate more logical volume capacity than a fully-loaded XIV can physically handle!

  • http://www.brookend.com Chris Evans

    Rich

    I think the XIV array needs a good deep dive to answer the questions you pose and many others I have too around the RAID reliability and rebuild. Watch this space, I’m trying to sort something.

    Chris

  • Pingback: Granularity of Thin Provisioning Approaches – Stephen Foskett, Pack Rat()

  • Chris Evans

    Carmen,

    Thanks for the pointer on VSS. I’ll have to do post on that at some stage. So, you’re right, with a more granular block size, 3Par will be better positioned to more efficiently reclaim.

    Chris

  • Chris Evans

    Bill, so is DP a free licensed offering? I don’t believe it is. How much does it add to the per TB cost of storage? Clearly there’s a cutoff point where the additional cost of DP negates its value. However my point on referencing the block size was to highlight the greater impact on recovery having a larger array chunk size would cause. I do agree with you though. There’s clear merit in deploying DP over fat volumes.

    Chris

  • Chris Evans

    Rob

    Agreed – I’ve been thinking how a tool could be written to scan a host and indicate how much data would be recovered with TP and Zero Page Reclaim.

    Chris

  • Chris Evans

    Soikki

    Good point, thanks. I think storage on VMware (especially the TP aspect) is definitely an idea for another post.

    Chris

  • Chris Evans

    Dale

    Good feedback, thanks for the comment.

    Chris

  • Chris Evans

    John

    Thanks for all the additional comments. As I wrote the post I think I was up to 1000+ words. Unfortunately some text has to get chopped and I did abbreviate in places. Thanks for the additional comments, I am planning to write more of a White Paper for TP; I’ll be sure to add your thoughts.

    Regards
    Chris

0 Flares Twitter 0 Facebook 0 Google+ 0 StumbleUpon 0 Buffer 0 LinkedIn 0 0 Flares ×