EMC Unveils Hadoop Starter Kit 2.0

This news was published on a blog by Nick Kirsch, CTO, Isilon storage division, EMC.

At Hadoop World in NYC, EMC announced the Hadoop Starter Kit (HSK) 2.0.

This free, open kit ups efficiency of all Hadoop distribution deployments by reducing the time and cost to deploy EMC Isilon Scale-Out NAS.

HSK 2.0 combines the power of vSphere big data Extensions 1.0 with Isilon NAS to achieve a big storage and analytics solution. Among the benefits of this solution include rapid provisioning, HA, multi-tenancy and portability. This means that users can use any Hadoop distribution throughout the big data application lifecycle with zero data migration, including Apache Open Source, Pivotal HD, Cloudera and Hortonworks.

As the avalanche of big data continues to grow, organizations are struggling to store, manage, protect and analyze both structured and unstructured data. Hadoop on Isilon also ensures end-to-end data protection through SnapshotIQ, SyncIQ and NDMP Backup – something that is missing or difficult on a commodity Hadoop platform.

With Isilon’s native HDFS integration and the ability to perform in-place analytics (that’s right – never migrate data again), users can now bring Hadoop to big data rather than vice versa. The benefits of a single file system with seamless multi-protocol support (NFS, CIFS/SMB, HDFS, etc.), which include avoiding the CapEx costs of purchasing a separate infrastructure and faster results in the absence of migrating petabytes of data, cannot be overstated.

Big Data aficionados are always excited about new tools and possibilities – so let’s dig into the details. HSK enables rapid provisioning by guiding you through the automation process. From the creation of virtual Hadoop nodes to starting up the Hadoop services on the cluster, much of the Hadoop cluster deployment can be automated, and requires little expertise on the user’s part. The automation process allows the Virtual Hadoop clusters to be rapidly deployed and configured as needed. In addition to deploying quickly, there is also a strong need for HA for certain mission-critical uses of Hadoop. HA protection is provided through the virtualization platform to protect the single points of failure in the Hadoop system, such as the NameNode for HDFS and JobTracker for MapReduce.

This tool kit represents both sides of big data – storage and analytics – while harnessing the power of Hadoop.