Uncategorized

SkipSJoin: A New Physical Design for Distributed Big Data Warehouses in Hadoop

October 20, 2020

Authors: Fadila Bentayeb, Nadia Kabachi, Omar Boussaid, Yassine Ramdane

Tags: 2019, conceptual modeling

Hadoop uses horizontal partitioning to improve the performance of a big data warehouse. A major challenge when horizontally partitioning the tables of a big data warehouse is to reduce network traffic for a given workload. A common technique to avoid this issue, when performing a join operation, is to co-partition the tables of the data warehouse on their join key. However, in the existing partitioning schemes, executing a star join operation in Hadoop still needs many MapReduce cycles. In this paper, we combine a data-driven and a workload-driven model to create a new scheme for distributed big data warehouses over Hadoop, called “SkipSJoin”. Our approach allows performing the star join operation in only one Spark stage, and allows skipping the loading of some unnecessary HDFS blocks. Our experiments show that our proposal outperforms some approaches in terms of query execution time.

Read the full paper here: https://link-springer-com.proxy2.hec.ca/chapter/10.1007/978-3-030-33223-5_21

SkipSJoin: A New Physical Design for Distributed Big Data Warehouses in Hadoop

EDITOR PICKS

Roger H.L. Chiang – 2023 ASOCA Winner

Join us in the magical Miami for the 2023 AIS SIGSAND!

Participate in SAND sessions at AMCIS 2023 – August 10 –...

POPULAR POSTS

Participate in SAND sessions at AMCIS 2023 – August 10 –...

Conceptual Modelling in the “Digital First” Era — A Joint AIS...

Call for Papers: ER 2021 in St. John’s, NL

POPULAR CATEGORY

Share this:

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY