Audio-based deep learning classification of laryngeal pathologies with detection of precancerous and cancerous lesions using Gammatone Cepstral coefficients

Zofia Tomaszewska, Julia; Kukwa, Wojciech; Georgakis, Apostolos

Audio-based deep learning classification of laryngeal pathologies with detection of precancerous and cancerous lesions using Gammatone Cepstral coefficients

Lists

Zofia Tomaszewska, Julia ORCID: https://orcid.org/0000-0002-9387-4350, Kukwa, Wojciech and Georgakis, Apostolos (2026) Audio-based deep learning classification of laryngeal pathologies with detection of precancerous and cancerous lesions using Gammatone Cepstral coefficients. Biomedical Engineering Advances, 11 (100211).

[thumbnail of 1-s2.0-S266709922600006X-main.pdf]

Preview

PDF
1-s2.0-S266709922600006X-main.pdf - Published Version
Available under License Creative Commons Attribution.
Download (2MB) | Preview

Official URL: https://www.sciencedirect.com/science/article/pii/...

Abstract

Introduction
Despite extensive research on audio-based voice pathology detection, current literature lacks clear and consistent evidence identifying acoustic features capable of reliably discriminating precancerous and cancerous laryngeal lesions, particularly when analysed using continuous speech signals.
Problem statement
The performance of audio-based laryngeal pathology classification systems on continuous speech remains significantly underreported, and commonly used Mel-Frequency Cepstral Coefficients (MFCCs) may be suboptimal for capturing pathology-related acoustic characteristics.
Objectives
This study investigates the hypothesis that continuous speech audio signals analysed with Gammatone Cepstral Coefficients (GTCCs) enable the accurate and precise detection of laryngeal pathologies, with the specific focus on precancerous and cancerous lesions.
Methods
An audio-based classification system employing GTCCs for feature extraction and a one-dimensional Convolutional Neural Network (CNN) for classification is proposed. The system considers three classes: precancerous and cancerous lesions, neuromuscular disorders, and healthy cases. Performance was evaluated using two datasets: a custom speech dataset collected for this research and the Saarbruecken Voice Database (SVD).
Results
GTCCs derived from speech signals delivered superior classification accuracy compared to the widely used Mel-Frequency Cepstral Coefficients (MFCCs). On the custom dataset, the proposed method achieved an average classification accuracy of 85.04% ±1.23 compared to 63.22% ± 1.62 using MFCCs. On SVD, GTCCs achieved 73.93% ±1.42, compared to 60.36% ±2.44 for MFCCs. The statistical significance of the obtained results was evidenced using t-test with the significance level set at 1%.
Conclusions
The results demonstrate that GTCCs extracted from continuous speech signals provide a robust and effective representation for audio-based laryngeal pathology classification, highlighting their potential for use in automated pre-screening systems targeting precancerous and cancerous voice disorders.

Item Type:	Article
Identifier:	10.1016/j.bea.2026.100211
Subjects:	Medicine and health
Date Deposited:	11 Mar 2026
URI:	https://repository.uwl.ac.uk/id/eprint/14723
Sustainable Development Goals:	Goal 3: Good Health and Well-Being