7th Annual SEAS Summer Research Symposium
APAM students, Daniel Edelberg (SEAS '19) and Rebecca Latto (SEAS '19), participated in the 7th annual SEAS Summer Research Symposium on October 4th, 2018, in Carleton Commons. The event, which was sponsored by the SEAS Undergraduate Student Affairs Office and the Columbia Undergraduate Scholars Program, featured over 30 undergraduate students who presented posters and talked to faculty and fellow students about their various summer research experiences.
"Phenotypes of Atrial Fibrillation: Machine Learning Stoke Risk Prediction in a Hospital Network Database"
Daniel G Edelberg, SEAS ‘19, Applied Mathematics, Columbia University
Supervising Faculty, Location of Research
Dr. Calum MacRae, Brigham and Women’s Hospital, Boston, MA
Atrial fibrillation (AF) is often associated with comorbid conditions impacting AF-related stroke risk. Defining phenotypes of patients in real-world clinical practice settings may improve prediction and subsequent management of AF-associated stroke risk. To address this, we applied machine learning techniques to assess stroke risk prediction in patients with AF from a longitudinal hospital network database using components of established clinical CHADS/CHA2DS2-VASc tools with conventional and data driven weighting as well as incorporation of additional clinical parameters including diagnostic codes and medications. The dataset consisted of 126,037 patients with a mean 11.29 +/- 7.96 years of follow up. As expected, stroke rates were associated with a diagnosis of AF and inversely with prescribed anticoagulant medications, stratified among four categories of treatment levels. Unexpectedly, conventionally calculated scores demonstrated a negative correlation with stroke risk. Reweighting of the components using a linear support vector machine revealed that the negative correlation was driven by diagnoses of heart failure, hypertension, and vascular disease. Reweighting produced a positive correlation with risk. Patients with the lowest score had a stroke rate of 10.7% vs. patients with highest revised score of 55.0%.
Conventional clinical tools did not correlate with stroke risk in a real world high risk patient population. Prior diagnoses of heart failure, hypertension and vascular disease negatively correlated with stroke rates. Development of a machine learning-based reweighting of components improved the correlation with real world stroke risk and may have utility in optimizing risk assessment and management.
Atrial fibrillation, stroke, machine learning, risk score, population health
"Earth Systems k-Means Toolbox: A Standardized Application of Multivariate k-Means Cluster Analysis on the Global Ocean Carbon Cycle"
Rebecca Latto, SEAS ‘19, Applied Physics and Applied Mathematics, Columbia University
Advanced pattern recognition and data mining techniques are becoming exceedingly popular in Climate and Earth Sciences as means of decomposing big data into its most significant features. This is particularly important for studies of the global carbon cycle, where ample data is available yet unexplored because of its size and complexity. We need to study these data sets because a lack of understanding confounds our ability to accurately describe, understand, and predict CO 2 concentrations and their changes in the major planetary carbon reservoirs.
Here we describe the implementation of multivariate k-means clustering on pCO 2 (Landschuetzer product) and temperature at 10m depth (ARGO Coriolis product) in the global ocean for 2000-2015. As the observation-based data is organized into various regimes, which we will call “ocean carbon states”, we gain insight into the physical and/or biogeochemical processes controlling the ocean carbon cycle.
We show that k-means effectively produces dynamic states which demonstrate complex interannual and spatial variability. Using various correlational methods and a neural network application, we can also parameterize the ocean carbon states by relevant climate indices (ENSO, AO, NAO) and other physical fields like salinity and chlorophyll.
Data science, clustering, ocean carbon cycle
Photos by Timothy Lee