TY - JOUR
T1 - Use of Natural Language Processing to Improve Identification of Patients With Peripheral Artery Disease
AU - Weissler, E. Hope
AU - Zhang, Jikai
AU - Lippmann, Steven
AU - Rusincovitch, Shelley
AU - Henao, Ricardo
AU - Jones, W. Schuyler
N1 - Generated from Scopus record by KAUST IRTS on 2023-09-25
PY - 2020/10/1
Y1 - 2020/10/1
N2 - Background: Peripheral artery disease (PAD) is underrecognized, undertreated, and understudied: each of these endeavors requires efficient and accurate identification of patients with PAD. Currently, PAD patient identification relies on diagnosis/procedure codes or lists of patients diagnosed or treated by specific providers in specific locations and ways. The goal of this research was to leverage natural language processing to more accurately identify patients with PAD in an electronic health record system compared with a structured data-based approach. Methods: The clinical notes from a cohort of 6861 patients in our health system whose PAD status had previously been adjudicated were used to train, test, and validate a natural language processing model using 10-fold cross-validation. The performance of this model was described using the area under the receiver operating characteristic and average precision curves; its performance was quantitatively compared with an administrative data-based least absolute shrinkage and selection operator (LASSO) approach using the DeLong test. Results: The median (SD) of the area under the receiver operating characteristic curve for the natural language processing model was 0.888 (0.009) versus 0.801 (0.017) for the LASSO-based approach alone (DeLong P
AB - Background: Peripheral artery disease (PAD) is underrecognized, undertreated, and understudied: each of these endeavors requires efficient and accurate identification of patients with PAD. Currently, PAD patient identification relies on diagnosis/procedure codes or lists of patients diagnosed or treated by specific providers in specific locations and ways. The goal of this research was to leverage natural language processing to more accurately identify patients with PAD in an electronic health record system compared with a structured data-based approach. Methods: The clinical notes from a cohort of 6861 patients in our health system whose PAD status had previously been adjudicated were used to train, test, and validate a natural language processing model using 10-fold cross-validation. The performance of this model was described using the area under the receiver operating characteristic and average precision curves; its performance was quantitatively compared with an administrative data-based least absolute shrinkage and selection operator (LASSO) approach using the DeLong test. Results: The median (SD) of the area under the receiver operating characteristic curve for the natural language processing model was 0.888 (0.009) versus 0.801 (0.017) for the LASSO-based approach alone (DeLong P
UR - https://www.ahajournals.org/doi/10.1161/CIRCINTERVENTIONS.120.009447
UR - http://www.scopus.com/inward/record.url?scp=85093942476&partnerID=8YFLogxK
U2 - 10.1161/CIRCINTERVENTIONS.120.009447
DO - 10.1161/CIRCINTERVENTIONS.120.009447
M3 - Article
SN - 1941-7640
VL - 13
JO - Circulation: Cardiovascular Interventions
JF - Circulation: Cardiovascular Interventions
IS - 10
ER -