Machine learning can use electronic health record (EHR) data to predict colorectal cancer (CRC) risk in adults 35 to 50 years of age, according to a study published online March 10 in PLOS ONE.
Hisham Hussan, M.D., from The Ohio State University in Columbus, and colleagues developed a prediction model using machine learning and EHR-derived factors to identify individuals aged 35 to 50 years who may benefit from early CRC screening. Analysis of the model’s effectiveness included 30 percent of 3,116 adults aged 35 to 50 at average risk for CRC who underwent colonoscopy between 2017 and 2020 at a single center, while the remaining 70 percent were used to develop the model.
The researchers found that all four machine learning models predicted CRC with higher discriminative ability versus the reference model (C-statistics [95 percent confidence intervals]: neural network, 0.75 [0.48 to 1.00] versus reference, 0.43 [0.18 to 0.67]; P = 0.07). Three of the four machine learning approaches (except for gradient boosting) predicted CRC or high-risk polyps significantly better than the reference model (regularized discriminant analysis, 0.64 [0.59 to 0.69] versus reference, 0.55 [0.50 to 0.59]; P < 0.0015). Income per ZIP code, the colonoscopy indication, and body mass index quartiles were the most important predictive variables in the regularized discriminant analysis model for CRC or high-risk polyps.
“Further development of our model is needed, followed by validation in a primary-care setting, before clinical application,” the authors write.