TCO for Unstructured Data Exposes Limitations of Hierarchical File Systems
Says Caringo
This is a Press Release edited by StorageNewsletter.com on March 29, 2011 at 2:57 pmThe explosive growth of unstructured digital assets has stressed traditional hierarchical file systems beyond the breaking point. It is estimated that unstructured content accounts for more than 80 percent of data created today and is predicted to grow by more than 800 percent over the next five years, according to Forrester Research. Traditional file systems were never designed to operate at the scale of today’s storage infrastructures and are not suited for data stores of hundreds of terabytes or multi-petabytes.
The limitations of conventional file systems become apparent when they are exposed to a storage TCO comparison with object storage for unstructured data, according to Caringo, Inc., provider of scalable object storage software for secure, self-managing, compliant and high performance storage clouds.
In the CAPEX category, object storage systems eliminate the complexity and computational overhead imposed by hierarchical file systems, allowing storage to run on simpler, less-costly hardware. In the case of Caringo, CAStor object storage is deployed on commodity x86 servers, a computing platform available from dozens of competing vendors with multicore processors and capacities to meet any size storage environment. Companies are freed from the vendor lock-in of expensive, proprietary NAS servers and can scale out their storage environment on an as-needed basis to take advantage of the ever-improving storage cost-capacity curve.
As different brands and models of x86 hardware can be mixed within a cluster, the CAPEX model changes dramatically. No need for purchasing more hardware than required at any point in time to cover eventually desired capacity using identical nodes. Likewise wholesale migrations and ‘forklift upgrades’ become a thing of the past. Instead, a continuous ‘flow of hardware’ in and out of the datacenter becomes the norm, with new machines being added to the cluster on a regular basis as older ones are decommissioned; all while the cluster remains 100% operational with performance and unrestricted data availability.
Additionally, CAStor object storage enables higher hardware utilization rates than conventional file systems with no performance degradation. CAStor has been proven to run at 97 percent capacity and more than a billion objects with no drop-off in performance, unlike traditional file systems where storage performance slows down as the system fills up. In fact, one storage vendor warns customers against using its system if utilization gets above 50 percent, requiring that users purchase additional proprietary hardware and capacity to keep the system running at that low threshold.
Being able to operate at 97 percent utilization enables CAStor customers to delay buying additional storage hardware.
Object storage cost savings on the OPEX side are similarly impressive. One commonly used metric in calculating storage TCO is gigabytes managed per person. This gives a good indication of storage management efficiency, tracking the amount of storage managed per full time equivalent (FTE). A petabyte-size storage cluster built on a hierarchical file system typically requires three to four full time administrators to provision storage where it’s needed, ensure the system is operating efficiently, manually balance performance, locate and replace any hardware failures and ensure that more hardware is added when the system starts to slow down.
A similar-size storage cluster with CAStor object storage can be managed with only a fraction of a full time administrator as the system does almost all of the work. A CAStor cluster is self-managing, self-balancing and self-healing. In the case of node hardware failure or unavailability, replica data is used, ensuring that there is no impact on applications or interruption in data access. Meanwhile, the system creates another replica on a different node. Should the original node come back online then any superfluous replicas will automatically be detected and trimmed. CAStor supports linear scalability and new nodes can be added seamlessly to a cluster for zero-impact capacity and performance upgrades with no application downtime and no need to migrate files.
CAStor provides additional OPEX savings with its Darkive adaptive power conservation technology that enables companies to set a few parameters and then adaptively spins down disks and reduces CPU utilization. Administrators have complete control over Darkive, and can specify periods of inactivity before disks are spun down and processors are throttled back so that CPU utilization is reduced to meet specific power reduction targets. Individual CAStor nodes can be designated as an archival tier with disks and processors only spinning up when data needs to be recalled from the archive, providing a mix of power cost savings and fast access to data.
Archival storage and regulatory compliance is another area where Caringo can produce cost savings for both capital and operational expenses. In a standard file system, archival and compliance data is relegated to a separate infrastructure with extra storage servers, additional disk and in many cases tape systems.
Such systems must be managed separately and with strictly controlled access for compliance data, which must be stored in an unalterable format, and – in the case of some medical records – must be retained for decades, creating a costly administrative nightmare. Object storage with CAStor eliminates many of these costs with built-in regulatory compliance. There is no need to purchase additional disk storage or a tape system to create a separate archiving infrastructure. Administrators simply set policies for retention and protection and CAStor applies WORM storage and file authentication to ensure that objects are immutably maintained throughout the content life cycle.