Publikation
Zero-shot cost models for distributed stream processing
Roman Heinrich; Manisha Luthra; Harald Kornmayer; Carsten Binnig
In: Yongluan Zhou; Panos K. Chrysanthis; Vincenzo Gulisano; Eleni Tzirita Zacharatou (Hrsg.). 16th ACM International Conference on Distributed and Event-based Systems. ACM International Conference on Distributed and Event-Based Systems (DEBS-2022), June 27-30, Copenhagen, Denmark, Pages 85-90, ACM, 2022.
Zusammenfassung
This paper proposes a learned cost estimation model for Distributed Stream Processing Systems (DSPS) with an aim to provide accurate cost predictions of executing queries. A major premise of this work is that the proposed learned model can generalize to the dynamics of streaming workloads out-of-the-box. This means a model once trained can accurately predict performance metrics such as latency and throughput even if the characteristics of the data and workload or the deployment of operators to hardware changes at runtime. That way, the model can be used to solve tasks such as optimizing the placement of operators to minimize the end-to-end latency of a streaming query or maximize its throughput even under varying conditions. Our evaluation on a well-known DSPS, Apache Storm, shows that the model can predict accurately for unseen workloads and queries while generalizing across real-world benchmarks.