What are you looking for ?
Advertise with us
RAIDON

Oracle CloudWorld: MySQL HeatWave Lakehouse with 17X Faster Query Performance Vs. Snowflake and 6X Faster than Redshift on 400TB Workload

Can load 400TB data from object storage 8x faster than Redshift and 2.7x faster than Snowflake, and scales to 512 nodes, and can process 100s of terabyte in object store in multiple file formats, including Aurora and Redshift backups.

Oracle Corp. announced MySQL HeatWave Lakehouse, enabling customers to process and query hundreds of terabyte of data in object store in a variety of file formats, such as CSV and Parquet, as well as Aurora and Redshift backups.

Oracle Mysql Heatwave Lakehouse 2210

MySQL HeatWave Lakehouse is an addition to the MySQL HeatWave portfolio, the cloud service that combines transaction processing, analytics, ML, and ML-based automation within a single MySQL database.

Powered by the massively parallel scale-out MySQL HeatWave architecture, it delivers better performance than competitive cloud database services for running queries and loading data, as demonstrated by industry standard benchmarks. In addition, in a single query, customers can query transactional data in the MySQL database and combine it with data in the object store using standard MySQL syntax. The company also announced MySQL Autopilot capabilities that improve performance and make MySQL HeatWave Lakehouse easy to use. MySQL HeatWave Lakehouse is available in beta for customers to try and is slated for general availability in 1H23.

Customers migrating from AWS, Google, and on-premises have been using MySQL HeatWave for a set of use cases including marketing analytics, particularly real-time analysis of advertising campaign performance and customer data analytics to build effective campaigns. Customers migrating from AWS include leaders in the automotive, telecommunications, retail, high-tech, and healthcare industries.

MySQL HeatWave is the result of years of research and advanced development, which we are turning into breakthrough innovations to address a bigger set of challenges for all MySQL customers. In fact, MySQL HeatWave Lakehouse is our third major MySQL HeatWave announcement this year,” said Edward Screven, chief corporate architect. “There is a huge growth in data stored outside of databases, and with MySQL HeatWave Lakehouse, customers can leverage all the benefits of HeatWave on data residing in object store. MySQL HeatWave now provides one integrated service on multiple clouds for transaction processing, analytics across data warehouses and data lakes, and machine learning without ETL. This combination helps deliver massive improvements in performance, automation, and cost – further distancing MySQL HeatWave from other cloud database services.

We are excited to continue our collaboration with Oracle, evolving it into supporting their new MySQL HeatWave Lakehouse offering, which is optimized to run on AMD EPYC-powered Oracle cloud instances and leverage the latest innovations in our processors,” said Mark Papermaster, CTO and EVP, AMD (Advanced Micro Devices, Inc.).The collective work of the AMD and Oracle engineering teams has helped create an impressive MySQL solution that can support great scalability and performance for transaction processing, analytics, machine learning, and machine learning-based automation within a single MySQL database.

The company is also publishing lakehouse benchmarks and introducing several innovative capabilities for MySQL HeatWave Lakehouse and MySQL Autopilot.

Benchmarks:
Faster than Snowflake and Amazon Redshift in both query performance and data loading 
As demonstrated by a fully transparent, publicly available 400 TB TPC-H (*) benchmark, the query performance of MySQL HeatWave Lakehouse is: 

  • 17x faster than Snowflake  
  • 6x faster than Amazon Redshift  

Loading data from object store into MySQL HeatWave Lakehouse is also significantly faster. For a 400TB TPC-H (*) workload, load performance of MySQL HeatWave Lakehouse is:

  • 8x faster than Amazon Redshift  
  • 2.7x faster than Snowflake 

All of these fully transparent benchmark scripts are available on GitHub for customers to replicate.

MySQL HeatWave Lakehouse sets the competition on fire by blazing the trail to the previously uncharted territory of 400 TB cloud database benchmarks at breakneck speeds,” said Ron Westfall, senior analyst and research director, Futurum Research. “MySQL HeatWave Lakehouse is a quantum leap for HeatWave in terms of processing capacity and computing power: from 32TB and 64 nodes to 400TB and 512 nodes with performance and price performance that handily beat Amazon Redshift and Snowflake. Meanwhile, the cloud database competitors have yet to respond to the in-database convergence and the multi-cloud presence of MySQL HeatWave. How will they cope with the 400TB MySQL HeatWave Lakehouse?”

Innovative capabilities for MySQL HeatWave Lakehouse

  • Larger data size, standard MySQL syntax: Customers can query up to 400TB of data with MySQL HeatWave Lakehouse, and the HeatWave cluster scales to 512 nodes. Customers use standard MySQL syntax for querying the data. 
  • Identical performance and compression: MySQL HeatWave offers the same query performance for data stored inside MySQL database or on object store – as demonstrated by both 10 and 30TB TPC-H benchmarks. Furthermore, the amount of compression achieved and the amount of data which can be processed per node is the same in both instances. 
  • Support for multiple file formats: With MySQL HeatWave Lakehouse, customers can load and process data stored in a variety of file formats, such as CSV and Parquet, as well as Aurora and Redshift backups from AWS. This enables customers to leverage the benefits of MySQL HeatWave even when their data is not stored inside a MySQL database. The query performance is the same regardless of the file format in which the data is stored. 
  • Ability to query data in MySQL and combine it with data in object store: With MySQL HeatWave Lakehouse, customers can query their OLTP data stored inside MySQL database and combine it with data stored in the object store. Any change made to the OLTP data is updated in real time and reflected in the query result.  

MySQL Autopilot capabilities for MySQL HeatWave Lakehouse
MySQL Autopilot provides machine learning-based automation for MySQL HeatWave. Existing MySQL Autopilot capabilities such as auto provisioning and auto query plan improvement have been enhanced for MySQL HeatWave Lakehouse, which further reduces database administration overhead and improve performance. In addition, a number of new MySQL Autopilot capabilities are now available for MySQL HeatWave Lakehouse.

  • Auto schema inference: Autopilot automatically infers the mapping of the file data to datatypes in the database. As a result, customers don’t need to manually specify the mapping for each new file to be queried by MySQL HeatWave Lakehouse – thereby saving time and effort. 
  • Adaptive data sampling: Autopilot intelligently samples portions of files in object storage, collecting accurate statistics with minimal data access. MySQL HeatWave uses these statistics to generate and improve query plans, determine the optimal schema mapping, and for other purposes. 
  • Auto load: Autopilot analyzes the data to predict the load time into MySQL HeatWave, determines the mapping of the datatypes, and automatically generates the loading scripts. Users don’t have to manually specify the mapping of files to database schemas and tables. 
  • Adaptive data flow: MySQL HeatWave Lakehouse dynamically adapts to the performance of the underlying object store. As a result, MySQL HeatWave can get the maximum available performance from the underlying cloud infrastructure which improves overall performance, price performance, and availability. 

Additional enhancements to MySQL HeatWave
The company announced a number of other enhancements to MySQL HeatWave spanning from ML to the VS code plug-in. The in-database ML capabilities of MySQL HeatWave have been further enriched to include support for forecasting models. New ML explanation techniques have been added which have been optimized for MySQL HeatWave. Data scientists can now influence various stages of the automated HeatWave ML training pipeline, including the choice of algorithm, feature selection, scoring metric, and the explanation technique. HeatWave ML has also been enhanced to allow customers to import ML models into HeatWave.

A new multi-engine Hypergraph query optimizer further improves the performance of complex queries and eliminates the need to specify the join order. Zone map has been added, which accelerates a broader set of queries with MySQL HeatWave. And the VS code plug-in for MySQL has been enhanced to support MySQL HeatWave capabilities.

Ready for distributed cloud
MySQL HeatWave is available in multiple clouds including OCI, AWS, and now Microsoft Azure. It’s available on-premises as part of OCI Dedicated Region for organizations that prefer not to move their database workloads to the public cloud. Customers can also replicate data from their on-premises MySQL OLTP applications to MySQL HeatWave to obtain near real-time analytics. MySQL HeatWave is always on the latest version of the MySQL database.

(*) Benchmark queries are derived from the TPC benchmarks, but results are not comparable to published TPC benchmarks results since these do not comply with the TPC specs.

Resources 
More about MySQL HeatWave
Video: MySQL HeatWave explainer

Articles_bottom
ExaGrid
AIC
ATTOtarget="_blank"
OPEN-E