This year’s Flink Forward Europe (FFE) was held at a new venue: the bcc Berlin Congress Center. The new location enabled organizers to feature even-more talks and accommodate more participants, which was timely, given the record attendance.
At FFE 2019, Philipp Grulich and Jonas Traub, researchers in the Database Systems and Information Management Group at TU Berlin, offered a presentation entitled “Scotty: Efficient Window Aggregation with General Stream Slicing.” Scotty is a high-performance window and aggregation operator that can be integrated into Apache Flink, Apache Beam, and Apache Storm. Window aggregation is a core operation in data stream processing systems.
Existing aggregation techniques focus on reducing latency, eliminating redundant computations, and minimizing memory usage. However, each of these aggregation techniques operates under different assumptions with respect to workload characteristics, such as the properties of aggregation functions (e.g., invertible, associative), window types (e.g., sliding, sessions), windowing measures (e.g., time-based, count-based), and stream (dis)order. Violating these assumptions can deem an aggregation technique to be unusable or drastically reduce its performance.
In their talk, Philipp and Jonas presented their research on a general stream slicing technique for window aggregation. Their technique automatically adapts to workload characteristics, so as to improve performance without sacrificing general applicability. Their research on general stream slicing was first published at EDBT 2019, where they received the Best Paper Award [1]. The Scotty Window Processor and its connectors for different streaming systems are available open-source [2] and the authors welcome contributions.
In addition, Jeyhun Karimov, a researcher in the Intelligent Analytics for Massive Data Research Department at the German Research Center for Artificial Intelligence (DFKI), presented his work on “AStream: Ad-hoc Shared Stream Processing [3],” a system built on top of Apache Flink. The central design principle of most existing streaming engines is to handle queries that potentially run forever on data streams with a query-at-a-time model, where each query is optimized and executed separately.
In many real applications, streams are not only processed with long-running queries, but also thousands of short-running ad-hoc queries. To support this efficiently, it is essential to share resources and computation for stream ad-hoc queries in a multi-user environment. AStream bridges the gap between stream processing and ad-hoc queries in streaming engines by sharing computation and resources.
AStream addresses three challenges. First, the challenge of integration: ad-hoc query processing should be a composable layer that extends stream operators, such as join and aggregation, among other operators. Second, the challenge of consistency: ad-hoc query creation and deletion must be performed in a consistent manner, to ensure exactly-once semantics and correctness. Third, the challenge of performance: unlike state-of-the-art stream processing engines, AStream maximizes both data and query throughput, via incremental computation and resource sharing. Thereby, making AStream the first system that supports all three of these requirements.
All of the slide presentations and recorded talks from the FFE conference are available online on SlideShare [4] and Youtube [5], respectively.
References
[1] “Efficient Window Aggregation with General Stream Slicing,” Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, and Volker Markl, The 22nd International Conference on Extending Database Technology (EDBT) 2019, https://openproceedings.org/2019/conf/edbt/EDBT19_paper_171.pdf.
[2] Scotty Window Processor, Open Source Repository on GitHub: https://github.com/TU-Berlin-DIMA/scotty-window-processor.
[3] “AStream: Ad-hoc Shared Stream Processing,” Jeyhun Krimov, Tilmann Rabl, and Volker Markl, SIGMOD 2019:
http://www.redaktion.tu-berlin.de/fileadmin/fg131/Publikation/Papers/karimov-astream-ad-hoc-shared-stream-processing_preprint.pdf
[4] All slides from Flink Forward Europe 2019 on SlideShare: https://www.slideshare.net/FlinkForward.
[5] Playlist: All Flink Forward Europe 2019 Talks on YouTube: https://www.youtube.com/playlist?list=PLDX4T_cnKjD207Aa8b5CsZjc7Z_KRezGz.