Skip to main content Skip to main navigation

Publication

Simplex Distributions for Embedding Data Matrices over Time

Kristian Kersting; Mirwaes Wahabzada; Christoph Römer; Christian Thurau; Agim Ballvora; Uwe Rascher; Jens Leon; Christian Bauckhage; Lutz Plümer
In: Proceedings of the Twelfth SIAM International Conference on Data Mining. SIAM International Conference on Data Mining (SDM-2012), April 26-28, Anaheim, CA, USA, Pages 295-306, ISBN 978-1-61197-232-0, SIAM / Omnipress, 2012.

Abstract

Early stress recognition is of great relevance in precision plant protection. Pre-symptomatic water stress detection is of particular interest, ultimately helping to meet the challenge of “How to feed a hungry world?”. Due to the climate change, this is of considerable political and public interest. Due to its large-scale and temporal nature, e.g., when monitoring plants using hyper-spectral imaging, and the demand of physical meaning of the results, it presents unique computational problems in scale and interpretability. However, big data matrices over time also arise in several other real-life applications such as stock market monitoring where a business sector is characterized by the ups and downs of each of its companies per year or topic monitoring of document collections. Therefore, we consider the general problem of embedding data matrices into Euclidean space over time without making any assumption on the generating distribution of each matrix. To do so, we represent all data samples by means of convex combinations of only few extreme ones computable in linear time. On the simplex spanned by the extremes, there are then natural candidates for distributions inducing distances between and in turn embeddings of the data matrices. We evaluate our method across several domains, including synthetic, text, and financial data as well as a large-scale dataset on water stress detection in plants with more than 3 billion matrix entries. The results demonstrate that the embeddings are meaningful and fast to compute. The stress detection results were validated by a domain expert and conform to existing plant physiological knowledge.

Weitere Links