Zofia Tomaszewska, Julia ORCID: https://orcid.org/0000-0002-9387-4350, Kukwa, Wojciech and Georgakis, Apostolos
(2026)
Audio-based deep learning classification of laryngeal pathologies with detection of precancerous and cancerous lesions using Gammatone Cepstral coefficients.
Biomedical Engineering Advances, 11 (100211).
Preview |
PDF
1-s2.0-S266709922600006X-main.pdf - Published Version Available under License Creative Commons Attribution. Download (2MB) | Preview |
Abstract
Introduction
Despite extensive research on audio-based voice pathology detection, current literature lacks clear and consistent evidence identifying acoustic features capable of reliably discriminating precancerous and cancerous laryngeal lesions, particularly when analysed using continuous speech signals.
Problem statement
The performance of audio-based laryngeal pathology classification systems on continuous speech remains significantly underreported, and commonly used Mel-Frequency Cepstral Coefficients (MFCCs) may be suboptimal for capturing pathology-related acoustic characteristics.
Objectives
This study investigates the hypothesis that continuous speech audio signals analysed with Gammatone Cepstral Coefficients (GTCCs) enable the accurate and precise detection of laryngeal pathologies, with the specific focus on precancerous and cancerous lesions.
Methods
An audio-based classification system employing GTCCs for feature extraction and a one-dimensional Convolutional Neural Network (CNN) for classification is proposed. The system considers three classes: precancerous and cancerous lesions, neuromuscular disorders, and healthy cases. Performance was evaluated using two datasets: a custom speech dataset collected for this research and the Saarbruecken Voice Database (SVD).
Results
GTCCs derived from speech signals delivered superior classification accuracy compared to the widely used Mel-Frequency Cepstral Coefficients (MFCCs). On the custom dataset, the proposed method achieved an average classification accuracy of 85.04% ±1.23 compared to 63.22% ± 1.62 using MFCCs. On SVD, GTCCs achieved 73.93% ±1.42, compared to 60.36% ±2.44 for MFCCs. The statistical significance of the obtained results was evidenced using t-test with the significance level set at 1%.
Conclusions
The results demonstrate that GTCCs extracted from continuous speech signals provide a robust and effective representation for audio-based laryngeal pathology classification, highlighting their potential for use in automated pre-screening systems targeting precancerous and cancerous voice disorders.
| Item Type: | Article |
|---|---|
| Identifier: | 10.1016/j.bea.2026.100211 |
| Subjects: | Medicine and health |
| Date Deposited: | 11 Mar 2026 |
| URI: | https://repository.uwl.ac.uk/id/eprint/14723 | Sustainable Development Goals: | Goal 3: Good Health and Well-Being |
Downloads
Downloads per month over past year
Actions (admin access)
![]() |
Lists
Lists