Tomaszewska, Julia (2025) Use of audio and bioimpedance signals with application of machine learning in classification of laryngeal pathologies. Doctoral thesis, University of West London.
Preview |
PDF (PDF/A)
Use of audio and bioimpedance signals _Julia Tomaszewska - PhD Thesis Final (Sept 25)_accessible.pdf - Submitted Version Available under License Creative Commons Attribution. Download (8MB) | Preview |
Abstract
The early and accurate detection of laryngeal pathologies, with particular focus on malignant lesions, remains a major clinical challenge due to the limitations and invasiveness of conventional diagnostic methods. This thesis addresses this challenge by developing a non-invasive, robust, and accurate classification framework, capable of identifying the cancerous and precancerous lesions with high precision and sensitivity, based on the combined input of audio recordings of human phonation and simultaneous laryngeal bioimpedance measurements obtained through electroglottography (EGG). The signals form a custom dataset developed specifically for the purposes of this study.
The unimodal deep learning classification architectures are developed and rigorously evaluated for each data modality. The highest-performing models are then used as modality-specific building blocks for the final multimodal classification system. The developed multimodal classifiers consistently outperform the unimodal baselines, particularly in prioritising the detection of cancerous and precancerous lesions. This accuracy is further enhanced using feature extraction methods based on the Equivalent Rectangular Bandwidth spectrum, which outperform alternative feature representations. Additionally, continuous speech provides a richer and more discriminative feature set than sustained phonation, leading to improved classification performance.
The final multimodal system, applied through late fusion stacked generalisation, combines the input of audio-derived GTCCs and EGG-derived Gammatone spectrograms, processed through separate one-dimensional Convolutional Neural Networks (CNNs), and integrated at the decision level using an ECOC-based meta-classifier. This system achieves the highest performance metrics, with average results over 10-fold cross-validation of 94.92% ± 2.82% accuracy, 96.67% ± 2.90% precision, 93.07% ± 3.53% sensitivity, 96.77% II ± 2.84% specificity, and 94.81% ± 2.90% F1 score for pathology detection, and the strongest performance in multi-class classification, particularly for malignant case detection (89.23% ± 1.95 accuracy, 82.32% ± 2.78 precision, 83.27% ± 3.43 sensitivity, 91.90% ± 1.75 specificity, and 82.76% ± 2.6 F1).
This work advances the understanding of multimodal laryngeal pathology classification and lays the foundation for future research into clinically deployable and real-time pathology screening tools.
| Item Type: | Thesis (Doctoral) |
|---|---|
| Identifier: | 10.36828/thesis/14389 |
| Date Deposited: | 03 Dec 2025 |
| URI: | https://repository.uwl.ac.uk/id/eprint/14389 |
Downloads
Downloads per month over past year
Actions (admin access)
![]() |
Lists
Lists