MapR Technologies Release Including Apache Hadoop 2.0 With YARN

MapR Technologies, Inc. announced at the O’Reilly Strata Conference: Making Data Work, its distribution including Hadoop 2.2 with YARN (Apache Hadoop NextGen MapReduce.)

YARN delivers resource management and is within a MapR cluster combining flexible resource management with the reliability and real-time capability of MapR’s next-generation data platform.

The resource management and scheduling capabilities allow Hadoop applications to share a cluster’s compute resources, increasing the efficiency and utilization of the cluster. By combining YARN with MapR’s R/W POSIX data platform, it enables YARN-based applications to not only run on a Hadoop cluster and share compute resources, but also read, write and update data in the underlying distributed file system and database tables. As a result, organizations have the ability to develop and deploy a broader set of big data Hadoop applications.

“YARN opens up Hadoop for processing patterns beyond just MapReduce,” said Evan Quinn, research director, Enterprise Management Associates. “MapR’s Hadoop distribution extends YARN even further by adding a full, open standard NFS interface in addition to HDFS, enabling non-MapReduce applications to optimally take advantage of a cluster’s storage.“

The company is also announcing that it enables organizations to run the Hadoop MapReduce 1.x and YARN schedulers on the same nodes in the cluster simultaneously, providing a path for MapReduce 1.x users to upgrade to Hadoop scheduler. This demonstrates the commitment to backward compatibility and customer success. In addition, it provides the ability to run third-party services that are not YARN-compatible on the same cluster.

“comScore runs more than 20,000 jobs each day on its production MapR cluster,” said Michael Brown, CTO, comScore. “We are excited that MapR is delivering Hadoop 2.0 and that MapR is providing a seamless upgrade path by supporting MapReduce 1.x and YARN on the same cluster.“

“As YARN expands Hadoop use cases in the enterprise, the need for enterprise-grade dependability, interoperability and performance increases exponentially,” said Tomer Shiran, VP, product management, MapR. “The combination of YARN and the MapR Data Platform delivers the only distribution for Hadoop in which both YARN and non-YARN distributed big data applications share the compute and storage resources of large-scale clusters.“

YARN-based applications on MapR inherit the HA, data protection, DR, security, and performance of the distribution. Moreover, they are more real time with because the MapR file system enables streaming writes, giving YARN-based applications access to the latest operational data.

With this release, the company continues to provide support for open source projects of any Hadoop-powered distribution. The distribution includes over one dozen open source projects, including Apache projects Hive, Pig, Solr, Oozie, Flume, Sqoop, HBase, and ZooKeeper, as well as Apache-licensed open source projects such as Multitool, Hue, Impala, and Cascading. In addition, it is a participant and contributor in the Apache Hadoop community and evaluates and adds new projects to its distribution, with many expected in 2014.

The company support multiple versions of key Hadoop-related projects in every distribution release. It also provides monthly updates to these projects, ensuring that customers have access to the latest community innovations along with enterprise stability.

The distribution including Apache Hadoop YARN will be available in March.