Syncsort Contributes Enhancements to Apache Hadoop
Strengthening big data integration and ETL capabilities
This is a Press Release edited by StorageNewsletter.com on March 7, 2013 at 2:48 pmSyncsort Incorporated announced a contribution in its ongoing commitment to the open source community, with a new feature that strengthens Apache Hadoop’s big data integration and ETL capabilities.
Sort in MapReduce
Sort is fundamental to the MapReduce framework, the data is sorted between the Map and Reduce phases. Syncsort’s contribution allows native Hadoop sort to be replaced by an alternative sort implementation, for both Map and Reduce sides, i.e. it makes Sort phase pluggable.
The new feature is now committed to Apache Hadoop 2.0.3-alpha and has received broad-based support from Hadoop organizations. The key improvement is a new feature that allows external sort implementations within the Hadoop MapReduce framework, helping organizations to accelerate development, build complex ETL flows and MapReduce jobs without coding and seamlessly optimize Hadoop. The patch also simplifies use cases that are currently challenging in MapReduce so they can be implemented faster and more efficiently.
"Hadoop is a rapidly evolving ecosystem that is emerging as the OS for big data," said Josh Rogers, SVP, data integration business, Syncsort. "Our focus is to help build out Hadoop’s data integration & ETL capabilities, removing barriers that undermine its potential and helping organizations ramp-up their big data initiatives."
Syncsort has worked with the Apache Hadoop community on enhancements and fixes and will continue to collaborate on future projects. The additional flexibility provided by the new feature will help the emerging ecosystem as well as current Hadoop users tackle a broader set of use cases for big data analytics. In addition, Syncsort will leverage the feature by delivering a pluggable version of its high-performance sort solution, DMExpress this spring, which is currently in beta test with select customers.
At the O’Reilly Strata conference in Santa Clara, CA, Syncsort will highlighted how the feature helps Hadoop MapReduce users. Syncsort also demonstrated how DMExpress can help organizations have the most current, accurate data available for business analysis, while reducing the cost and complexity of processing increasingly large amounts of data.