Top 5 Points to Avoid Storage Failure and Critical Applications Downtime

Jéronimo Munoz, VP operations, SAN Sentinal SA, brings over two decades of experience in scaling storage business to market leadership. Prior to San Sentinel, he has been performing in the storage market, re-launching Auspex and starting up the operations for Southern of Europe at Exanet as regional VP, prior to the M&A with Dell in 2009. He has been also regional sales manager at BlueArc France, Spain and Portugal until the final M&A by HDS. In 1986, he joined Sun Microsystems, where he contributed to establish Unix. Prior to that, he held senior management positions at Apple, where he was involved in functions including finance and sales. He contributed launching Apple in Europe in 1982. He started his carrier as credit manager at Data General in 1975. Born in Spain in 1957, he is fluent in French, Englsih and Spanish.

Over the last 12 months, 64% of companies have experienced a production downtime of an average 25 hours.

Consequences on businesses were often severe despite backup and DR protocols. Analysts say that 36% of these interruptions have a direct financial impact and 34% of them prevent releasing new marketing applications.

In an increasingly competitive market and the uprising of new financial models brought by the cloud, companies have to speed up the time to software delivery. Applications must be running 24/7 and an unplanned downtime can cause up to 100 % profit loss and worse.

According to Vanson Bourne, over the last 13 months, Storage downtime has cost more than $1.7 trillion to companies (50% of Germany’s GDP).

Human error is the main cause of these issues with an estimated 30%. This figure is high due to the increasing complexity of infrastructures with more and more different technologies. 10% of the issues are directly linked to multiple participants (multi-vendor infrastructure). Centralization and shared information in one unique control point have become mandatory.

However, the main fear of operatives and managers is running out of space in a pool or array, consequences are as dire as a power outage.

Anticipation and speed are key to keeping applications highly available,
and in order to maintain that you need to:

Reduce human error by allowing all the technical team working on your storage infrastructure to have the same exact information thanks to a common easily accessible database (DCIM).
Understand your infrastructure regardless of the vendor by frequently analyzing its evolution. Weekly or monthly audits are crucial to risk reduction. Take into account your products’ end of support/end of life in your risk management.
Manage the complexity of your infrastructure by automating information collection and using simple analysis and optimization tools to reduce Root Cause Analysis time. Get information and avoid homemade scripts and reports based on outdated and inaccurate data.
Anticipate before running out of storage by frequently monitoring over allocation provided with technologies such as thin-orovisioning that assume users won’t consume all the storage provided or at least not all at the same time. When using thin-provisioning is the growth/fill rate of your pools. There are simple solutions to help predict when you will be out of space per pool/array/datacenter.
Detect applications at risk in your infrastructure based on their criticity level by regularly monitoring the evolution of its components by business line or geographical location. Set up targeted preventive measures with regards to the severity of the potential disaster and guarantee its resolution with a precise action plan.

As says Simon Robinson, The 451 Group, says: “The challenge that organizations are having now is not just the data growth, but managing that environment, which may consist of seven or eight storage silos with huge complexity.”

Setting up control and optimization tools of the storage infrastructure (SRM) is essential to reduce risks and costs linked to a production downtime. Unfortunately most of them are using an obsolete model that does not meet modern day requirements and aren’t agile enough for companies to be more dynamic and performing.

In 2015, setting up such a service should be done without limitations (SaaS), on the spot (right now as opposed to ‘in a year’s time’, instantly available (no installation required) and using today modern technologies (cloud / web services).