Exclusive Interview With Amir Aharoni, CEO and Co-Founder, Elastifile
In flash-native cloud-scale storage platform for unstructured data
By Philippe Nicolas | January 9, 2018 at 2:19 pm
He is:
- CEO and co-founder of Elastifile Ltd.
He was:
- Co-founder, president, and CEO of Mobixell Networks (acquired by Flash Networks)
- President and COO at Optibase Ltd.
- VPt Engineering and COO of VDOnet (acquired by Citrix)
- VLSI engineer at Intel
StorageNewsletter.com: Could you summarize the genesis of Elastifile? What about company’s founders and their track records?
Amir Aharoni: Elastifile was founded late 2013 to deliver true hybrid-cloud unified storage and data management solution for dynamic enterprise workflows, such as cloud-bursting for analytics, high performance computing, and the most demanding traditional applications (containerized or virtualized). We help our customers move beyond the severe compromises of rigid, proprietary storage silos (both on-premises and on-cloud) with a new enterprise-class architecture for managing both active data (with a software-defined scale-out distributed Elastifile Cloud File System, or ECFS) and inactive data (with our CloudConnect objective tiering and replication services). By delivering a consistent and elastic global namespace across sites and clouds, this solves the data bottleneck for hybrid IT agility. Our founders and leadership team, based in both Santa Clara, CA and Herzliya, Israel, bring the diverse mix of storage, enterprise applications, virtualization, and flash expertise required to deliver on this promise. Shahar Frank, our CTO, brings deep storage and file system expertise from XtremIO, RedHat’s Qumranet virtualization, and Exanet (now Dell FluidFS). Roni Luxenburg, our R&D leader, came from RedHat where he led their early virtualization engineering from Qumranet, as well as Pentacom (acquired by Cisco). And I am also a serial entrepreneur, having led several start-ups through successful growth and exits, including Mobixell Networks and Optibase.
So far, Elastifile has raised $74 million in 3 rounds in 3 years and you have 65 employees. How do you explain the presence of Huawei, Cisco, Lenovo or Dell EMC Ventures in your capital? And at another level Samsung, Western Digital and Micron?
Indeed, we are growing nicely with both our customers, channel partners, and our employees. Our investors have given us more than just their financial support, they’ve collaborated with us to exploit significant synergies that enhance our joint product development and go-to-market strategies. In addition to the strategic investors you mention, we have tier one VC support from Lightspeed, Battery, and CE Ventures. The strategic investors represent two different parts of the market that are seeking to build the hybrid cloud bridge:
- Enterprise cloud vendors, including incumbent OEMs from North America and China. Elastifile complements their current portfolios with a solution that can run natively on-cloud or on-premises, with a uniquely elastic, ‘bring-your-own-hardware’ solution for integrated file and object storage. One example of this synergy is our recently announced Dell OEM solutions appliance.
- Flash storage vendors. Elastifile is pushing flash ubiquity even further since our active data tier is built entirely around flash. We optimize natively in both our distributed metadata and data services to deliver uniquely consistent scale-out performance with consistent latency around 1ms, no matter how big the storage cluster gets, on-cloud or on-premises. And we have innovated some nice technical synergies with these investors.
It seems that Elastifile has made a small realignment with its original mission. I remember you wished to build a scalable distributed file system dedicated for on-premise environments. What were the events and markets trends that triggered this evolution of your mission to finally today deliver a flash-native cloud-scale data storage platform for unstructured data?
We had always envisioned our solution for truly dynamic hybrid cloud workflows. This means 1) delivering cloud-like efficiency and consumption costing to on-premises IT, 2) enabling no-compromise enterprise lift and shift for on-cloud users, and 3) bridging all of it with a holistic self-service model across active and inactive data for the ultimate hybrid cloud agility and optimized TCO.
But this is no simple step for enterprise customers. With 30 years of mature data and availability services, and ever-increasing performance requirements, moving workloads into the cloud is like building the Golden Gate Bridge. Some vendors and customers start on the traditional on-premises side, while others have started on-cloud. Both are now building towards each other to connect the bridge so that enterprise workflows like analytics and EDA can move back and forth seamlessly, depending on their requirements for infrastructure, data governance, and self-service. Indeed, the just-announced Microsoft Azure acquisition of Avere shows the importance of building this bridge to the enterprise. For Elastifile, we designed for elastic data scale and access across both sides of the bridge, but we knew we had to prove ourselves first as a worthy primary storage offering. That’s why we started with the on-premises active data file system. Now, over the past year, we have delivered our native on-cloud services and integrated inactive data object tiering. Most importantly, this perfectly tracks what the enterprise customers want for their own journeys to hybrid IT and hybrid cloud.
In today’s IT needs, what are the top 5 key requirements? Why should end-users select Elastifile, the company is pretty recent and it exists out there with several file storage companies? Give me the compelling events.
The next fundamental pivot for enterprise IT is hybrid IT. Most analysts define this as the critical evolution for business process agility, with ‘everything as a service’, with infrastructure as a simple commodity elastic enabler that is spun up dynamically by users (not IT), regardless of what site or cloud it actually sits in. One fundamental requirement for these workloads (90% of which are still native file-based) is that they should not have to be refactored. We hear so much about ‘cloud-ready apps’, but this is quite presumptuous by the vendors. What customers really want is ‘app-ready clouds’, and data is the biggest challenge. This is hybrid cloud and multi-cloud, which is why hybrid IT data must be provided by a cross-cloud data fabric that can:
- Allow user self-service to elastically spin up (and down) their workloads in a simple standardized way regardless of what site or cloud service provider the infrastructure is on (think self-service cloud bursting for genomics researchers or EDA designers).
- Meet enterprise service level requirements for performance, data services, and availability/security, even for the most demanding applications, regardless of what site or cloud or hardware its running on
- Be hardware (and cloud) agnostic, enabling seamless flexibility for customer choice across underlying server hardware and multi-source cloud providers
- Integrate data management across active and inactive data tiers so users can dynamically (and automatically) pull from cost-effective active archive for analytics, test/dev, DevOps, and other ‘secondary storage’ workloads
- Deliver consumption based pricing and cost-effective cloud-migrations/bursting: despite the hype, both proprietary on-premises appliances (traditional arrays and HCI appliances) and the cloud service from AWS, Azure, Google, etc. are still prohibitively expensive for the dynamic, elastic hybrid cloud workflows that customers want. Elastifile fixes that, including minimization or elimination of the dreaded ‘cloud egress fees’.
Dozens of end-users and IT buyers at both enterprises and service providers have already purchased Elastifile for this next step in their hybrid cloud services journey. Technically, we are very different because we provide the most holistic approach, with a unique combination of cloud-scale distributed metadata, true Global Namespace, true Posix-compliance for native app-ready clouds, linear-scale-out for adding IO/s and bandwidth without compromising latency, true bring-your-own-hardware, and integrative active/inactive data tiering across file and object, on-premises and on-cloud. For most customers, the compelling event is when they realize they need to leverage the cloud as a part of their core business process workflows. A couple examples include
- A large global semiconductor company that is using Elastifile to burst its EDA workflows to AWS around its big tape-out simulation peak loads. They can finally do this without refactoring their applications, and it is saving them huge infrastructure costs as they move to consumption-only cloud resources for their peak demand times
- A drug developer that is migrating its core molecular modeling and genomics analysis workloads to Google Cloud platform to have the most competitive and agile drug development time-to-market
I know that you don’t use erasure coding (EC) and prefer replication plus data reduction technique? What is the rationale behind this choice as many other players deliver N+2/+3 protection? One of the value of EC is also the reduction of the hardware overhead, do you believe your replication + data reduction is better than EC?
Erasure coding is an enabling technology to reduce cost, but the question is where does it make sense for enterprise workflows when it comes to hybrid and public cloud deployments? We see enterprise customers want a simpler two-tier data and storage model for these self-service hybrid cloud workflows: active and inactive data. Indeed, this follows the powerful, agile model already innovated for DevOps and most analytics workloads. Each of the two tiers has unique characteristics and requires the right media, cost, and technology for that tier. Elastifile offers a seamless mechanism to dynamically transfer data between the tiers and the different underlying hardware and data formats of each tier. The inactive tier is focused on very low-cost storage with minimal requirements on consistency of performance, and is therefore a natural fit to object storage technologies (on-cloud or on-premises) and this is where erasure coding should apply. In contrast, the active data tier requires high and consistent performance as well as dynamic deployment flexibility to deliver reliable SLAs to the applications using the data services. These requirements make replication (with data reduction from space-efficient snapshot shipping, deduplication, and compression) a much better and more flexible fit for the active data tier, providing highly reliable and consistent performance, under both normal operations and during recovery from unforeseen events and failures. And this is essential for on-cloud deployments like cloud-bursting, where EC is not a good fit due to failure handling and redundant redundancy of the native block services like EBS. By integrating this powerful two-tier model across both file and object services, our approach removes the performance, scale, and HW specific issues, caused by using EC for active data, while providing hybrid cloud users the optimal TCO and self-service through the use of dynamic ILM for active-inactive data object tiering and tier-optimized data reduction. To say it another way, customers want to apply the right tools to the right use cases, and erasure coding is best for the inactive (object) data tier.
Today Elastifile is deployed as several independent clusters glued by your asynchronous N-Way replication. Developing a global file system is the Holy Grail especially if it’s connected with the cloud with multiple instances of clusters? What is your vision and plan about that? Is it on your roadmap?
This is the heart of our cross-cloud data fabric strategy. End-users should be able to spin up and spin down infrastructure in a dynamic self-service way as the workflow requirements change. They should be able to check-in and check-out their active and inactive data between the ‘cheap and deep’ object tier and the high performance flash clusters on-demand. They should be able to trust workloads to automatically migrate data back and forth across these tiers and the different on-cloud/on-premises sites that host them as cross-cloud ILM. They should be able to do all of this without worrying about which sites and clouds provide the underlying infrastructure, with no-compromise enterprise class SLAs for consistent performance and data and availability services. That is what we’ve built, and our roadmap will further extend it with the cross-cloud global namespace, cross-cloud ILM, and other game-changing capabilities (stay tuned).
How do you solve data consistency challenges? You considered Paxos and Raft but you preferred to develop your own consensus algorithm named Bizur, could you elaborate on that?
These are indeed two critical challenges, especially for enabling elastic scale out workflows cross-cloud. Elastifile is using several techniques to ensure data consistency, most importantly our ‘redirect on write’ architecture. Data is never written in place, and each data blob has a unique, never repeating ID (UUID) so that we can verify that we are really reading the data we mean to. In addition, we have several layers of checksums (data, md, network). In this way we can keep true, strict consistency across a cloud-scale distributed infrastructure.
Another key aspect is our consensus algorithm. Paxos and Raft both have the same consistency model, where the entire system is treated as a single consistency domain. When you have a data set that is composed out of many keys and values, the entire key/value-set needs to be consistent across nodes. While this is very strong, it also imposes a performance penalty at scale. For cloud-scale, we created a new algorithm where keys are treated as separate entities. For example, assume we have keys A, B, C, something we need to reach a consensus on A, but not on B and C. So Bizur is our patented multi-key algorithm that is optimized for that use case, as if you could run a large number of Raft/Paxos state machines in parallel. Together, with our 3-tier distributed metadata model, we solve these challenges efficiently in the real world for both on-cloud and on-premises deployments at massive scale.
What are the use cases you target? What are the segments where you see the fastest growth?
We are seeing broad market demand for this cross-cloud data fabric wherever enterprise customers want to cloud-burst peak demand, lift and shift migrate their workloads to the cloud, or create true hybrid cloud workflows where one step of their value chain best sits in the cloud (for elastic capacity or specialized capabilities like GPUs) while the rest sits on-premises.
We see this especially in horizontal use cases such as:
- Stateful containers
- Consolidated/multi-source analytics
- Mixed workload consolidation
And vertical use cases such as:
- Life sciences
- EDA
- Financial tech
- Media and entertainment
- Oil and gas
Who are the most seen competitors for you? Is it more classic file storage giants or newcomers like you?
Our competitive profile actually splits depending on the customer’s initial deployment objectives- remember the Golden Gate Bridge I mentioned earlier. For customers looking to deploy Elastifile initially on-premises (while future-proofing for on-cloud), the primary competition is indeed the incumbents like NetApp or the proprietary HPC appliances using Spectrum Scale (formerly GPFS) or the open source options. For on-cloud, the main competitors are the native file services like EFS or caching appliances like Avere. Wherever the customer starts, and whoever we compete with, Elastifile delivers a uniquely holistic approach to enable the next-gen enterprise hybrid cloud with uncompromised enterprise features and performance across file and object tiers.
As Elastifile is closely integrated with cloud service providers – AWS, Azure and GCP – and on-prem object storage thanks to your CloudConnect capability, could you tell us the deployment proportion between AWS, Azure and full on-prem?
Again, this primarily reflects where enterprise customers are on their hybrid IT journey. 50% of our customers are enterprises deploying us initially on-premises, while 10% are deploying us primarily on-cloud. The remaining 40% are actually cloud service providers themselves, who are using Elastifile as the foundation for the next generation MSP and other services, linking their customers’ on-premises workflows with the hosted site and also any back-end infrastructure on the public clouds. But 100% of our customer are looking to leverage the full hybrid cloud capabilities over time.
What is your pricing philosophy? Is it based on file system node instance, capacity, cloud extension…?
This is one where we can be incredibly disruptive, and we are having fun with it. Instead of the old massive upfront Capex cost for storage, we have built our pricing to mirror the cloud’s consumption pricing. We completely align price to value and consumption. Our primary pricing meter for active data workloads is storage capacity, although in the cloud this is reflected as per-node pricing (with set capacity per node). We offer three consumption/subscription tiers: 1) Elastifile Enterprise for active data (ECFS) deployments on-premises (only), 2) Elastifile Enterprise CloudConnect for active and inactive data (ECFS and CloudConnect object tiering), and Elastifile Enterprise Cross-Cloud for the total deployability of all capabilities on-cloud and on-premises.
For on-premises deployments, this means an initial subscription based on a small 50TB capacity commitment, after which all cost is based on quarterly ‘true-ups’ for actual consumption. For on-cloud deployments, this can be applied for BYOL, or customers can purchase us from the native Marketplace and Launcher options (coming soon). Unlike most other options, we do not require any virtual appliance or service to be running all the time, so our cost and the underlying cloud infrastructure cost is only incurred when the ECFS cluster is spun-up and data is check-out of the object tier. And because everything has rich data reduction, both the infrastructure and egress costs are significantly reduced.
You have 40 customers. We understand you sell your software through the channel but you recently extended it with an OEM agreement. Could you give us more details about this deal?
Indeed, we are building a channel to serve our enterprise customers wherever they are on their hybrid IT journey. For on-premises purchases, this is driven primarily VADs and resellers, and now the OEMs as well (several of whom are strategic investors). The recently announced Dell OEM Solutions appliance is a win-win: customers can order it directly from Dell and Elastifile has a scalable route to market globally. Of course, we are working to offer customers similar OEM appliances with others. For on-cloud purchases, there are different channels evolving right now. Of course, we will be available as a native service on the marketplaces for AWS, GCP, and Azure, and as a default part of the services offered by tier 2 service providers. But there are also new cloud services brokers that offer Elastifile, like Agosto, Stratozone, and Innova.
How do you see the market, especially with cloud pressure and newcomers in file storage?
The hybrid cloud pivot is real, and it is now. The Microsoft acquisition of Avere shows this clearly. There is of course tremendous noise as the storage incumbents hype their old offerings with the new cloud vocabulary, and as other start-ups tell their stories. Many cloud service providers, big and small, are embracing us as a key enabler to building their bridge to on-premises workflows. Our customers tell us we are unique in our cross-cloud approach, our enterprise-class data services, and our integration of active and inactive data (file and object) tiers. By focusing on the use cases that most benefit from this cross-cloud data fabric, we help our customers and accelerate the hybrid IT market shift.
More globally, how do you see the future of the company and the product? What are the next steps?
We’ve worked incredibly hard over the past 4 years, and we’ve still got a lot of work to do. But the roadmap and the use cases are now clear. We will complete the full capabilities of the cross-cloud data fabric, including more advanced services for dynamic ILM, copy data management, security, multi-tenancy, and additional optimizations for each of the major cloud providers. We will expand our use cases and go deeper in enabling end-user self-service management for each use case. And we will find and exploit more and more ways for our customers to lower their TCO on-premises, on-cloud, and moving dynamically across clouds.
Read also:
Elastifile Delivers New Hybrid Cloud Data Fabric Appliances
Scale-out all-flash data fabric bundled with servers
2017.11.02 | Press Release
Start-Up Profile: Elastifile
In software-defined data infrastructure for on-premises, in-cloud and hybrid cloud deployments
by Jean Jacques Maleval | 2017.04.12 | News