Publikation
Automated Segmentation of Polish Sibilants Using Modified YAMNet Architecture for Computer-Aided Speech Diagnosis in Children
Michal Krkecichwost; Pawel Badura; Artur Piet; Md Abid Hasan; Natalia Mocko; Zuzanna Miodonska; Agata Sage; Marcin Grzegorzek
In: Pawel Badura; Joanna Czajkowska; Arkadiusz Gertych; Jacek Kawa; Ewa Piketka; Wojciech Wieclawek (Hrsg.). Information Technology in Biomedicine - 10th International Conference, Proceedings. International Conference on Information Technologies in Biomedicine (ITIB-2025), June 23-25, Zabrze, Poland, Pages 144-154, Advances in Intelligent Systems and Computing (AISC), Vol. 1464, ISBN 978-3-031-95582-2, Springer Nature Switzerland, Cham, 2025.
Zusammenfassung
In this paper, we address computer-aided speech diagnosis by designing a method for the automated detection of Polish sibilants in preschool children. Our database was recorded from 47 children aged four to seven using a 15-channel data acquisition device. We propose a modified YAMNet architecture to classify short speech segments from the main channel into four classes. The segments are represented by a dedicated acoustic image based on filter-bank energies and their derivatives. We use a set of time-series data augmentation procedures over the data from all microphones to improve training. With the segment classification results, we determine a frame-wise speech segmentation to extract sibilants. Our segment classification model yields overall accuracy of 87.9%, with the sibilant classification recall and precision at 92.3% and 87.7%, respectively. The sibilant segmentation accuracy reaches 96.2% with an F1 score of 73.5%.
