Finding the right cloud configuration for analytics clusters

Muhammad Bilal, Marco Canini, Rodrigo Rodrigues

Research output: Chapter in Book/Report/Conference proceedingConference contribution

20 Scopus citations


Finding good cloud configurations for deploying a single distributed system is already a challenging task, and it becomes substantially harder when a data analytics cluster is formed by multiple distributed systems since the search space becomes exponentially larger. In particular, recent proposals for single system deployments rely on benchmarking runs that become prohibitively expensive as we shift to joint optimization of multiple systems, as users have to wait until the end of a long optimization run to start the production run of their job. We propose Vanir, an optimization framework designed to operate in an ecosystem of multiple distributed systems forming an analytics cluster. To deal with this large search space, Vanir takes the approach of quickly finding a good enough configuration and then attempts to further optimize the configuration during production runs. This is achieved by combining a series of techniques in a novel way, namely a metrics-based optimizer for the benchmarking runs, and a Mondrian forest-based performance model and transfer learning during production runs. Our results show that Vanir can find deployments that perform comparably to the ones found by state-of-the-art single-system cloud configuration optimizers while spending 2X fewer benchmarking runs. This leads to an overall search cost that is 1.3 - 24X lower compared to the state-of-the-art. Additionally, when transfer learning can be used, Vanir can minimize the benchmarking runs even further, and use online optimization to achieve a performance comparable to the deployments found by today's single-system frameworks.
Original languageEnglish (US)
Title of host publicationProceedings of the 11th ACM Symposium on Cloud Computing
Number of pages15
ISBN (Print)9781450381376
StatePublished - Oct 13 2020

Bibliographical note

KAUST Repository Item: Exported on 2020-11-20


Dive into the research topics of 'Finding the right cloud configuration for analytics clusters'. Together they form a unique fingerprint.

Cite this