Tier or not Tier? One Level or Multiple Levels of Storage?

For decades, tiering divides the industry with ambassadors of the approach and also detractors who claim that tiering can be avoided.

This debate exists in a business climate with plenty of data management companies existing for decades and key innovations that invite users to simplify storage infrastructure. It is implicitly associated with unstructured data covered essentially by file data.

Clearly the dimension of the project is important here, do users add resources like file servers or NAS to an existing perimeter or replace and refresh installed environments with brand new models? The rapid answer is both and obviously means that implicit various levels exist as various generations, models, brands, capacities and technologies are mixed and eventually coexist.

Let’s watch a bit in the past, remember HSM (Hierarchical Storage Management) model where different storage levels were coupled with a data moving policy to select file candidates to be migrated to one or multiple secondary storage levels. These candidates were inactive contents that increase the TCO of primary storage. With that technique, secondary storage were other disk-based subsystems, optical or tape libraries. From an application standpoint, secondary levels were all masked and only the primary level were seen i.e primary file system or we should say production file system. Applications were able to see all files at least their metadata and trigger the access. For non migrated files everything was direct and access was immediate; for migrated files, an intelligent software layer was added to mask the moved content, displayed metadata and offered a mechanism to point to the secondary storage where the migrated data live. But applications don’t access data on secondary storage. Applications read a file and thus an access request was blocked but initiated in the background a data copy from secondary to primary and then the pending access operation was relaxed on the primary content presence. Different techniques were developed and we saw stubs and other elements coming.

HSM was famous for high capacity environments like mainframe, also in large Unix systems, Netware and a bit on Windows server OS flavors, we can list Amass and FileServ from E-Systems later owned by ADIC now Quantum, Cray Data Migration Facility aka DMF now a HPE product, Epoch then acquired by EMC in 1993 now Dell, Lachman/Legent then CA now Broadcom, Avail Systems with NetSpace and Alphatronix and Artecon oem, OpenVision AXXiON NetBackup HSM Extension then Veritas (Symantec and again Veritas), Cheyenne Software with NETstor, Palindrome with Storage Manager, Software Moguls with SM-Arch and OTG then Legato, Unitree, LSC then Sun now Oracle, HPSS, IBM ADSTAR DSM, Filetek now HPE via the SGI acquisition or Programmed Logic then CrosStor acquired by EMC in 2000 now Dell.

File systems integrations introduced some technical challenges and development efforts that have significantly limited adoption, thus the industry addressed this by creating Data Management Application Programming Interface (DMAPI) to simplify and harmonize such axis. But HSM hit a wall and new models have emerged with a more global ILM philosophy, tiering, routing, object storage and cloud storage during a period where we saw flash coming, new high capacity HDD and tapes, fast networks while prices dropped for all technologies and product offerings with plenty of new vendors. The market continues to expand with a lot of vendors as the business opportunity is huge with the volume of data ever growing.

In fact, Information Lifecycle Management with its DLM (D for Data) flavor in the early 2000’s have crystallized the masterization of that with the cost of storage associated with the value of the data. In other words, hot data must reside in fast storage with high performance characteristics with throughput, IO/s or latency. Application SLAs put pressure on that storage entity. Vendors tried to articulate TCO optimization with fast large HDDs coupled with slower cost effective HDDs but before flash the gap was not so obvious to achieve, special conditions, volumes of data and environments were more suitable for such approaches.

This notion of data present on multiple storage levels is also closed to archiving especially with a more recent iteration known as active archive collapsing access time as data become accessible by applications without special actions from users or management tools.

Tiering is a modern term that is obviously associated with the temperature of data from the generation to its disposal. Of course HSM used some similar threshold model with least access time… attributes. Between these 2 extreme levels, several others could be set and aligned with different storage entities. Tiering was also the opportunity to introduce Network File Management (NFM) and Network File Virtualization (NFV) that unify and aggregate file servers/NAS with in-band or out-of-band design. Immediately we understand the value, having a logic between consumers and producers make things more easy to move and manipulate file data. Users have the possibility to migrate and replace file servers but also add new ones, keep multi copies on diverse servers, create some optimization schemes with hot data on fastest servers and vice-versa, thus fully seamlessly. The copy back behavior from HSM that all people have hated was replaced by a routing mechanism that points the request where the data lives avoiding data movements. And the result is also obvious, access time is reduced and user experience largely improved. It was the era of File Area Network (FAN) with players like Attune, coming from Z-force and later asset swallowed by F5 Networks in 2009, Acopia Networks also acquired by F5 in 2007, AutoVirt who collapsed, NeoPath acquired by Cisco in 2007, Rainfinity acquired by EMC in 2005 or NuView acquired by Brocade in 2006 for StorageX. Brocade has even tried to develop a new iteration named File Management Engine. StorageX is promoted, developed and supported today by Data Dynamics.

We also must list structured data players with Outerbay acquired by HPE, Princeton Softech acquired by IBM, TierData and Applimation acquired by informatica, Indusa acquired by Synoptek, Sand acquired by Harris Computer, RainStor acquired by Teradata, Solix still independent and of course native databases vendors offerings. The idea here is to reduce active database size by offloading records to secondary databases keeping the capability to request to the active, archive and both data sets. The active or hot database thus with its size reduced delivers fast queries numbers and results.

Three key technologies and services appeared and confirmed their needs, they’re flash, object and cloud storage. All these illustrate that tiering is ubiquitous in our world. Users can tiered data to on-premises object storage or to cloud storage especially if they mix SSDs and HDDs using different generations of file servers and NAS. This more recent data management model grouping tiering, migration, analytics, classification, indexing and search, replication/DR, tape/optical and cloud support, worm capabilities… is covered by many many vendors such as Aparavi, Atempo, Congruity360, Data Dynamics, Datadobi, FujiFilm, Grau Data, Hammerspace, HPE DMF, HPSS, IBM, Komprise, Moonwalk, Nodeum, Point Software and Systems, QStar, Quantum, Spectra Logic, StrongBox, Tiger Technology, Versity… among others and of course with offerings from big gorillas as well.

So today, we have several storage entities available to deploy projects aligned to applications goals: flash/SSD, HDD, tape and S3 with on-premises object storage or cloud storage. Keep in mind here that some are media technologies and others access methods.

The market has also invited us to consider some radical new approaches. In fact to avoid multiple tiers, the solution must beat the cost of an other tier if it’s an on-premises product and even cloud storage pricing considering the data management extra cost as well. The example we have in mind is an all-flash scalable NAS solution with advanced data reduction and erasure coding that beats the cost of HDDs. In that, users can store all data on the same tier, a flash one at the best price. It reduces complexity, it doesn’t mean such model beats cloud storage and also addresses the energy bill. Keep in mind when you wish to store 10PB online fully accessible for 10 years, do you need this, or do you accept a bit more latency for a drastic cost reduction. We don’t make any recommendation as each project is different but keep in mind the 3 dimensions that impact a storage solution: the data reduction ratio, the erasure coding hardware overhead ratio and the energy factor.