Author: Mike Karp, Senior Analyst,
Enterprise Management Associates
“Storage density” is the term that describes how much storage capacity can be packed into a specified amount of space. The actual measurement varies a bit depending on your focus. Disk drive vendors measure bits per square inch, while tape vendors think in terms of bits per linear inch. Data center managers measure things at a more macro level: for them storage density is a measure of terabytes (and increasingly, petabytes) per square foot of floor space.
At least that’s the way things were measured up until recently.
A Zero-Sum Game
The price for getting power into the data center has risen substantially in the last two quarters, and the price for data center power and cooling is a line item in most storage budgets, causing pain. If you have worries around the question of “how am I going to afford an increase in power consumption”, then you may be one of the lucky ones, because for some of your colleagues, power availability is akin to what economists call a zero-sum game.
A zero-sum game is a situation in which there is only a finite amount of “goodness” available to all the players. In the case of data center power consumption, this would apply to situations in which there is no more power to be had than what already exists. In such cases, if you want to plug in anything new, you have to unplug something that is already in service. Willingness or ability to pay for additional power is not the issue – data centers may be willing to pay for more power; in some locations however, there is just no more power available at any price.
This is what many data center managers confront today. At such sites, vendors know that if they have any hope of replacing existing storage assets they’ll have to do so without adding to the existing power load. The mantra in this case has become “more storage, less power.”
What Can be Done?
When power consumption is high on a list of worries, it’s time to consider a newer metric for storage density: gigabytes per watt, a measurement that indicates how much storage you get from each watt of electricity you buy. Fortunately, there are several ways to squeeze more gigabytes of storage from each watt of power consumed in the data center. Here are some of the choices.
The best path to storage efficiency is probably the use of data deduplication. Every vendor now offers this capability (sometime also referred to as “single instancing”), which eliminates redundant data, replacing redundant bytes or blocks with pointers, which take up much less space. Deduping can reduce the overall need for storage capacity by anywhere from 10-30%, depending on the application and the kind of data being stored. Alas, some data does not dedupe well (it’s hard to identify redundant bytes in a graphics file, for instance). Deduping is an easy way to save on storage costs and on the power needed to drive arrays, and it doesn’t necessarily require committing to new architectures.
At some point however, new storage will have to be purchased.
MAID (Massive Arrays of Idle Disks) systems spin down their disks (in every case that I know of, these are SATA) until the data on them is needed, at which point they spin up again for as long as they are in use. This is highly efficient in terms of power use, but is clearly not a suitable technology when high performance is needed. MAID is finding a home as a storage medium for “persistent” data, information that needs to be available for reference but for which speedy access is not necessary. Think online archives.
When it comes to high-end storage, solid state devices (SSDs) have become increasingly interesting of late because their prices are now beginning to rival those of traditional top-tier disk arrays. Because they have no spinning media they consume less power than more traditional approaches, and because they move data across RAM with no mechanical movement involved, I/O speeds are measured in microseconds rather than milliseconds. Performance across the array is thus extremely fast. While their price has dropped significantly of late, they are still more expensive than HDD-based arrays, and so are generally suitable only for specialized situations. This doesn’t mean high-performance computing such as engineering applications however. Rather, it indicates a need for judicious use of a relatively expensive asset. Plain old commercial databases can get lots of value from these when they are used intelligently, which is to say if sites put metadata rather than data on them. SSDs represent what may come to be thought of as a new storage tier, “tier 0”. It is in recognition of this that several vendors are beginning to ship hybrid arrays as boxes with a mix of spinning disk and SSDs inside.
A new generation of small form factor (SFF) disks will arrive towards the end of this year – samples are already out there – from all the major vendors. These 2.5 inch SAS HDDs will offer high performance (at both 10K and 15K spindle speeds) and much lower power consumption. Thus, while more of them will be needed to achieve a targeted capacity than would be the case with the larger SATA drives, they take up less space and need less power; as a result, the “physical density” accommodates the need for “power density”. When I was giving the Serial Attached SCSI (SAS) keynote talk for the SCSI Trade Association last May, interest in these was HOT. These are for tier 1 storage devices.
New Metrics for New Efficiencies
So, there are several ways to address the storage/power problem. But which one to choose? That, of course, depends upon the use case, and it raises one added concern: while storage as a function of power appears to be a simple measurement of efficiency, experienced readers will realize that this particular definition of “efficiency” is not half as simple as it appears to be. Different tiers of storage are best suited to different things, which is why we have separate tiers in the first place. That being the case, it makes no more sense to measure the efficiency characteristics of tier 1 storage against archiving devices than it does to compare a Ferrari with the family station wagon.
Users need a more wide-reaching metric than storage as a function of power. A more inclusive metric should also include a reference to throughput. Only with that sort of tool can IT managers make the most informed decisions about which storage is appropriate for them to buy, given the current power consumption circumstances.
Constructing this new sort of metric should be relatively easy to do, as all the parameters are very well understood. The big challenge will be that most humans are not particularly well-equipped to deal with three-dimensional conceptual models. We tend to be most comfortable with graphs that have an x- and y-axis only.
But that issue should be treated as a speed bump and not a roadblock. Providing more simple-yet-comprehensive tools such as this would be a great way for the vendor community to deliver even more services to its customers.
Biography
Mike Karp, Senior Analyst, Enterprise Management Associates, leads the storage practice. He has spent most of his 25-year career with vendors that include Prime Computer, Symbios Logic, and Belcore. Much of his interest, writing and research focus is focused on efficient data center operations, digital archiving and reducing the complexities associated with complex enterprise systems. His writings on storage are widely available on the web. Starting in May, he will be the storage columnist for Systems Administration News, and will author the storage blog on the internetevolution.com website.