Primary Dedupe: The Next Big Thing in Storage
By Tony Asaro on Jun 29, 2011 | In Data Management
I have been pounding the drum on dedupe in primary storage for a very long time and I am surprised that the market hasn’t acted more quickly. This capability is even easier to quantify than snapshots and thin provisioning and yet it’s adoption has been slow.
The reasons for implementing primary dedupe is as clear as day. Data growth is ridiculous and never ending. The math is simple to grasp:
• Primary storage is growing at a CAGR of 60%.
• If you have 10 TB of data today that means you will have 16 TB next year at this time.
• In five years this will turn into 104 TB.
• If you have 100 TB of data today you will have 1.04 PB in five years.
• And since most storage systems have about 40% capacity utilization then you are talking about 250 TB of capacity to store 100 TB of data and 2.5 PB of capacity to store 1 PB of actual data.
Let us do the dedupe math:
• If you just get a 4-to-1 ratio then 10 TB of data is reduced to 2.5 TB of data.
• Based on a 60% CAGR in one year you will have 4 TB and in five years it will be about 26 TB. Compare that to 104 TB in five years!
• If you get a 10-to-1 ratio then you will only have about 10 TB in five years versus 104 TB! That is a magnitude in the difference of actual data being stored. And those dedupe ratios are achievable in virtualized environments.
I know it sounds too good to be true but even with a modest dedupe ratio the economics are simple to quantify and justify.
The strange thing is that we really don’t have wide adoption of primary dedupe. It is a no-brainer technology that very few storage vendors have actually implemented. NetApp has a distinct advantage over other storage vendors and is actually winning business because of their dedupe technology. To be candid, NetApp dedupe does have a number of limitations and yet none of their major competitors have stepped up to answer the call.
There are signs that other storage vendors are stepping up. Dell acquired Ocarina and IBM bought StorWize. Additionally, Permabit is a vendor that has primary dedupe technology and there are a number of vendors they are working with. I predict they will be acquired shortly and that will leave every other storage vendor out in the cold. However, none of these technologies have made their way into the market yet. A startup called Nimble Storage is growing like crazy and while they don’t actually have dedupe they do have in-line data compression and even with that they have measurable cost per GB advantage over their competition. Data compression is good. Dedupe is better. And data compression combined with dedupe is the best.
I could be cynical and conclude that storage vendors don’t want to implement primary dedupe because it would cost them money. But I doubt that is the case because it is inevitable and it already is costing them money since they are losing business over it every day. I think it the reason is primary dedupe is really hard to implement. Therefore the vendor that does it best will have a clear advantage over all of the others.
NetApp gained leadership for many years in great part because of their snapshot technology. 3PAR was acquired for an unprecedented price in great part because of their thin provisioning technology. The jury is still out on which storage vendor will be the primary dedupe leader but whoever it is will inevitably experience great success. And it will change the industry for the better.
2 comments
I'm looking at primary compression being available before dedupe, but that's just my opinion.
Trey - agreed that both performance and scalability are barriers. I also agree that Data Domain works well as a backup target and that primary I/O is very different and that is why DataDomain will never find its way into primary. I also agree that data compression is easier to implement with primary than dedupe. It appears I agree with everything you said! However, I do not think performance and scalability are insurmountable issues. Especially if you talk to the Permabit guys - they say they have an architecture that conquers both so it will be interesting to see one of their OEMs bring their dedupe to market.
Additionally, most data is dormant within a very short window afters its creation. And processors and memory keep getting faster and faster.
I also believe data compression is valuable but even more so when you combine it with dedupe.
I am convinced that it is inevitable and that it will become pervasive- it is an issue of time but I believe we are close.
Leave a comment
| « HDS and BlueArc Finally Tie the Knot | Nirvanix: Cloud Storage for the Enterprise (For Real) » |
