
SHUHUA (JESSICA) YIN
Graduate student of Data Science and Business Analytics at University of North Carolina - Charotte
Objectives & Background:
•Anthracycline-based chemotherapy aims to erase the undetectable cancer cells and reduce recurrence, but caused cardiovascular abnormalities
•Data: 224 breast cancer patients with information and measurements
•Aims to predict if and how likely a cancer patient is going to have heart issues in long term (on month 24) based on baseline (month 0) information
Data Wrangling & Exploration (R, Python, Tableau):
•Filter out columns with desired percentage of missing values
•Slice and dice data from different perspectives and use the one that provides the most info
•Explore distributions, normality tests, correlation tests on all variables
Data Analysis & Predictive Modeling (on-going; R, Python):
•Build different machine learning models (Logistic Regression, Linear Regression, LASSO, Ridge Regression, Elastic Net, Neural Network, Random Forest)
•Evaluate the accuracy of each model on our data and select the most fitted method
•Fit the model with our data and produce the desired predictions
