A Statistically Validated Machine Learning Framework for Early Alzheimer's Disease Detection Using Structured Clinical Data
DOI:
https://doi.org/10.64389/icds.2026.02297Keywords:
Alzheimer's disease, Early Detection, Machine Learning, Ensemble Learning, Statistical Validation, Explainable AI, SHAPAbstract
Early detection of Alzheimer's disease (AD) is vital in resource-limited settings that rely on structured clinical data. This study presents a rigorous, interpretable benchmarking framework using a dataset of 2,149 subjects (64.6\% cognitively normal; 35.4\% AD), split into training (80\%) and testing (20\%) sets. Five models—Logistic Regression, Random Forest, a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), and Optimized Gradient Boosting—were evaluated under identical conditions. Optimized Gradient Boosting achieved the highest performance on the test set ($n = 430$) with 95.10\% accuracy, 92.10\% sensitivity, 96.80\% specificity, an F1-score of 0.93, and the fewest false negatives ($n = 12$). Random Forest also performed strongly (93.95\% accuracy), while linear and deep learning models were less effective. SHAP analysis aligned model predictions with key clinical biomarkers, including functional assessments, activities of daily living (ADL), and MMSE scores, demonstrating that ensemble tree-based models excel in structured clinical settings.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Shehu Mohammed, Neha Malhotra, Anmol Singh Rai, Sourabh Kumar

This work is licensed under a Creative Commons Attribution 4.0 International License.
