Evaluating the heterogeneity of logistic regression models to predict coronary artery disease status
Coronary artery disease (CAD) is one of the most diagnosed heart diseases globally, affecting about 5% of adults over the age of twenty. Lifestyle changes can positively impact risk of developing CAD and are especially important for individuals with high genetic risk. In this study, we sought to predict the likelihood of developing CAD using genetic, demographic, and clinical variables. Leveraging genetic and clinical data from the UK Biobank on over 500,000 individuals, we classified and separated 500 genetically similar individuals to a target individual from another 500 genetically dissimilar individuals. We repeated this process for 10 target individuals as a proof-of-concept. Then, we used CAD-related variables such as age, relevant clinical factors, and polygenic risk score to train models for predicting CAD status for the 500 genetically similar and 500 genetically dissilimar groups, and determined which group predicts the likelihood of CAD more accurately. To compute genetic similarity to the target individuals we used the mahalanobis distance. To reduce the heterogeneity between sexes and races, the studies were restricted to British male Caucasians. The models using the more similar individuals demonstrated better predictive performance. The area under the receiver operating characteristic curve (AUC) was found to be significantly higher for the ‘similar’ rather than the ’dissimilar’ groups, indicating better predictive capability (AUC=0.67 vs. 0.65, respectively; p-value<0.05). These findings support the potential of precision prevention strategies, since one should build predictive models of disease for any one target individual from more similar individuals to that target even within an otherwise homogenous group of individuals (e.g., British Caucasians). Although intuitive, such pratices are not done routinely. Further validation and exploration of additional predictors are warranted to enhance the predictive accuracy and applicability of the model.