Skip to main content Skip to main navigation

Publication

Millisecond-Level Sound Order Classification via Amplitude

Rezaul Tutul; Ilona Buchem; André Jakob; Niels Pinkwart
In: 2025 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA). International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA-2025), August 7-9, Antalya, Türkiye, Pages 1-6, IEEE, 8/2025.

Abstract

Abstract—Accurate temporal detection of overlapping nonverbal sounds is essential for fair and responsive human-robot interaction (HRI), especially in multi-user educational environments where sound-based inputs act as quiz buzzers. This paper presents a real-time sound order detection system capable of distinguishing the sequence of two overlapping non-verbal audio cues captured on a single channel with millisecond-level precision. A Random Forest classifier, trained on a custom sound dataset, utilizes three audio features: onset strength difference, spectral centroid difference, and onset time difference. After applying amplitude-based filtering (threshold = 0.015), the system achieved 99% classification accuracy using 100 estimators and was able to correctly predict sound order with as little as a 0.3 ms time difference between sounds. Evaluations using synthetically generated delays (0.01–0.3 ms) show a steep performance increase beyond the 0.05 ms threshold, with nearperfect accuracy from 0.1 ms onwards. The proposed system was deployed in an educational quiz game involving two physical buzzers mapped to distinct sound signatures (“laser” and “charge”). The model's predictions were validated with video recordings during real-time gameplay, confirming perfect order detection under real-world conditions. A custom scale assessing user experience showed significant improvements in subscales such as motivation and engagement, effectiveness of non-verbal interaction, and perceived clarity. These results underscore the potential of the system to enhance multimodal HRI by enabling accurate, fast, and natural sound-based interaction.