The vision paper "The Case for Distance-Bounded Spatial Approximations" by Eleni Zacharatou, Andreas Kipf, Ibrahim Sabek, Varun Pandey, Harish Doraiswamy and Volker Markl advocates for approximate spatial data processing techniques that omit exact geometric tests and provide final answers solely on the basis of (fine-grained) approximations. Thanks to recent hardware advances, this vision can be realized today. Furthermore, these approximate techniques employ a distance-based error bound, i.e., a bound on the maximum spatial distance between false (or missing) and exact results which is crucial for meaningful analyses. This bound allows to control the precision of the approximation and trade accuracy for performance.
A preprint version is available here.
The demo paper “Semi-Supervised Data Cleaning with Raha and Baran” by Mohammad Mahdavi and Ziawash Abedjan demonstrate how two formerly developed systems, Raha and Baran, can be used within an end-to-end data cleaning pipeline. In practice, with a small number of 20 user-annotated tuples, it is possible to effectively identify and fix data quality problems inside a dataset. Furthermore, both systems benefit from knowledge on prior cleaning tasks. Using transfer learning, both systems can optimize the data cleaning task at hand in terms of error detection runtime and error connection effectiveness.
A preprint version is available here.
To learn more about CIDR 2021, please visit http://cidrdb.org/cidr2021/index.html