State of Object Storage 2018 Report – Chaos Sumo
Emergence of AWS S3 data lake
This is a Press Release edited by StorageNewsletter.com on April 24, 2018 at 2:39 pmChaos Sumo, Inc., a cloud-based log data retention and analytics service for object storage, released the findings of The State of Object Storage 2018 Report: The Emergence of the AWS S3 Data Lake.
Click to enlarge
As object storage such as AWS S3 continues to gain widespread enterprise momentum, with over 70% of companies reporting to use it today, it offers untapped opportunities for promising new use cases such as historical log analytics, and application and media hosting.
More than one third of respondents in a recent survey, conducted by Chaos Sumo in December-January 2018, are also looking to object storage to streamline and enable data lake usage for historical trend analysis and machine learning. The study also found that the top barriers preventing S3 innovation are the lack of tools today that enable data access and visibility, and costs of moving data around in order to analyze the growing volumes of disparate object storage data with accuracy and scale.
“The current inability of businesses to perform consistent, longitudinal and easy trend and predictive analysis in object storage, including log analytics, is resulting in critical business information being thrown away or archived in an inaccessible manner,” says Thomas Hazel, founder and CTO, Chaos Sumo. “This hidden culprit – the increasing costs of storing data for real- or near-time analysis, is the core impediment to doing more with the growing amount of data stored in object storage such as AWS S3, and Chaos Sumo is here to tackle this head on.”
Major findings from the report based on over 120 responses from data science, analytics, engineering and DevOps/ IT professionals across a variety of organizations include:
Object storage has gone mainstream – AWS S3 is here to stay
• 72% of respondents report using AWS S3 or another form of cloud-based object storage today, with 40% anticipating their investment in object storage to grow over 50% in the next year.
As growth of AWS S3 object storage explodes,
its intended uses cases are shifting toward analytics
• While 83% of respondents use the service as a cheap alternative to traditional on-premises storage solutions for backup, storing, and archiving data, object storage is increasingly being used for application hosting (38%), media hosting (34%), and business analytics (32%).
The biggest challenges with object storage are visibility into the stored data,
ability to analyze the data right in S3 and the costs of moving the data
• Despite having a data lake, only 36% of respondents can easily access the data, and a mere 7% claim it is easy to analyze that data today.
• As object storage data expands, concerns around greater storage, compute, and network costs grow with 37% of respondents being worried about the increasing costs. Specifically, for Elastic/ ELK users, the prohibitive storage costs associated with EBs of data in S3 are compounded with additional effort and resources needed for its scaffolding, which renders most of this rich data inaccessible.
Myriad of analytics tools that only do part of the job
a major culprit for analytics challenges:
• 42% are using home-grown solutions for solving visibility and analytics issues within object storage/S3, while others quote using tools such as RedShift (51%), Amazon Athena (23%), and Elastic Logstash Kibana (ELK) (7%).
• These tools are not only inadequate at addressing the jobs needed to be done, they also take a lot of time to set up and manage – 52% of respondents say it took them more than three months to build their current analytics architecture.
Data lakes are slowly gaining momentum within the enterprise
• 28% of respondents report having data lakes today, with another 18% planning to implement one in the next 12-18 months.
Founded in 2017 and headquartered in Somerville, MA, Chaos Sumo is a cloud-native data analytics service, enabling automated and cost-bending data scaling and long-term log and event data retention on AWS S3. It extends the Elastic Stack (ELK) by automating the discovery, normalization and indexing of all of log data types and sources. The service enables historical trend and machine learning analytics at a fraction of the cost of alternative solutions and provides the ability to perform both relational and text-based analysis through a single integrated Kibana interface. Log data can be organized, managed, indexed and analyzed directly via REST-based S3 and Elasticsearch APIs, delivering value in minutes and enabling DevOps teams, data engineers and data analysts to be more productive.