k-Means is a versatile clustering algorithm widely used in practice. To cluster large data sets, state-of-the-art implementations use GPUs to shorten the data to knowledge time. These implementations commonly assign points on a GPU and update centroids on a CPU. We identify two main shortcomings of this approach. First, it requires expensive data exchange between proces- sors when switching between the two processing steps point assignment and centroid update. Second, even when processing both steps of k-means on the same processor, points still need to be read two times within an iteration, leading to inefficient use of memory bandwidth. In this paper, we present a novel approach for centroid update that allows us to efficiently process both phases of k-means on GPUs. We fuse point assignment and centroid update to execute one iteration with a single pass over the points. Our evaluation shows that our k-means approach scales to very large data sets. Overall, we achieve up to 20× higher throughput compared to the state-of-the-art approach.
@article{pub10089,
author = {
Lutz, Clemens
and
Breß, Sebastian
and
Rabl, Tilmann
and
Zeuch, Steffen
and
Markl, Volker
},
editor = {
Härder, Theo
and
Schenkel, Ralf
},
title = {Efficient and Scalable k-Means on GPUs},
year = {2018},
volume = {Schwerpunktbeitrag},
number = {13222},
pages = {1--13},
journal = {Datenbank-Spektrum},
publisher = {Springer}
}
German Research Center for Artificial Intelligence Deutsches Forschungszentrum für Künstliche Intelligenz