Toggle Main Menu Toggle Search

Open Access padlockePrints

Resource and Performance Distribution Prediction for Large Scale Analytics Queries

Lookup NU author(s): Professor Raj Ranjan


Full text for this publication is not currently held within this repository. Alternative links are provided below where available.


Efficient resource consumption and performance estimation of data-intensive workloads is central to the design and development of workload management techniques. Recent work has explored the efficacy of using distribution-based estimation of workload performance as opposed to single point prediction for a number of workload management problems such as query scheduling, admission control, and the like. However, the proposed approaches lack an efficient workload performance distribution prediction in that they simply assume that the probability distribution function (pdf) of the target value is already available. This paper aims to address this problem for an inseparable portion of big data analytics workloads, Hive queries. To this end, we combine knowledge of Hive query executions with the novel usage of mixture density networks to predict the whole spectrum of resource and performance as probability density functions. We evaluate our technique using the TPC-H benchmark, showing that it not only produces accurate pdf predictions but outperforms the state of the art single point techniques in half of experiments.

Publication metadata

Author(s): Khoshkbarforoushha A, Ranjan R

Publication type: Conference Proceedings (inc. Abstract)

Publication status: Published

Conference Name: 7th ACM/SPEC on International Conference on Performance Engineering ICPE '16

Year of Conference: 2016

Pages: 49-54

Print publication date: 01/01/2016

Online publication date: 12/03/2016

Acceptance date: 02/04/2016

Publisher: Association for Computing Machinery


DOI: 10.1145/2851553.2851578

Library holdings: Search Newcastle University Library for this item

ISBN: 9781450340809