Skip to main content Skip to main navigation

Publication

Zero-Shot Cost Models for Parallel Stream Processing

Pratyush Agnihotri; Boris Koldehofe; Carsten Binnig; Manisha Luthra
In: Rajesh Bordawekar; Oded Shmueli; Yael Amsterdamer; Donatella Firmani; Andreas Kipf (Hrsg.). Proceedings of the Sixth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM-2023), aiDM@SIGMOD 2023, June 18, Seattle, WA, USA, Pages 5:1-5:5, ISBN 979-8-4007-0193-1, ACM, 2023.

Abstract

This paper addresses the challenge of predicting the level of parallelism in distributed stream processing (DSP) systems, which are essential to deal with different high workload requirements of various industries such as e-commerce, online gaming, etc., where DSP systems are extensively used. Existing DSP systems rely on either manual tuning of parallelism degree or workload-driven learned models for tuning parallelism, which is either not efficient or can lead to costly operator migrations and downtime when there are workload drifts. Thus, we argue for a learned model that can autonomously decide on the parallelism degree while generalizing across workloads and meeting the current demands of DSP applications. We propose a novel approach that leverages zero-shot cost models to predict parallelism degree while generalizing across unseen streaming workloads To reduce training effort, we propose a rule-based strategy that selects parallelism degree and meaningful transferable features related to query workload and hardware that influences the parallelism decisions. We demonstrate the effectiveness of our strategy by evaluating it with different amount of training queries and show that it achieves lower costs for parallel continuous query processing.

More links