Uncategorized

Don’t Tune Twice: Reusing Tuning Setups for SQL-on-Hadoop Queries

October 20, 2020

113

Authors: Edson Ramiro Lucas Filho, Eduardo Cunha de Almeida, Stefanie Scherzinger

Tags: 2019, conceptual modeling

SQL-on-Hadoop processing engines have become state-of-the-art in data lake analysis. However, the skills required to tune such systems are rare. This has inspired automated tuning advisors which profile the query workload and produce tuning setups for the low-level MapReduce jobs. Yet with highly dynamic query workloads, repeated re-tuning costs time and money in IaaS environments. In this paper, we focus on reducing the costs for up-front tuning. At the heart of our approach is the observation that a SQL query is compiled into a query plan of MapReduce jobs. While the plans differ from query to query, single jobs tend to be similar between queries. We introduce the notion of the code signature of a MapReduce job and, based on this, our concept of job similarity. We show that we can effectively recycle tuning setups from similar MapReduce jobs already profiled. In doing so, we can leverage any third-party tuning adviser for MapReduce engines. We are able to show that by recycling tuning setups, we can reduce the time spent on profiling by 50% in the TPC-H benchmark.

Read the full paper here: https://link-springer-com.proxy2.hec.ca/chapter/10.1007/978-3-030-33223-5_9

Don’t Tune Twice: Reusing Tuning Setups for SQL-on-Hadoop Queries

EDITOR PICKS

Roger H.L. Chiang – 2023 ASOCA Winner

Join us in the magical Miami for the 2023 AIS SIGSAND!

Participate in SAND sessions at AMCIS 2023 – August 10 –...

POPULAR POSTS

Participate in SAND sessions at AMCIS 2023 – August 10 –...

Conceptual Modelling in the “Digital First” Era — A Joint AIS...

Call for Papers: ER 2021 in St. John’s, NL

POPULAR CATEGORY

Share this:

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY