1. Introduction
Heart failure (HF) is a health condition that occurs when the heart cannot pump enough blood to meet the needs of the body. The healthcare industry relies on machine learning model to predict heart failure cases. You have been asked by SHU clinic team to explore different programming and analytics techniques to analyseand evaluate heart failure model performance to make informed decisions on patient’s survival. For this purpose, you will make use of programming concepts such as use of custom module, function definitions, file processing and exception handling, use of scientific computing, data analysis, data visualization and machine learning libraries (such as numpy, pandas, matplotlib, and scikitlearn) in the implementation to predict patients’ survival.
1. Dataset
The dataset for this assignment can be downloaded from the PCP Blackboard module site or the link below. Please study the dataset in terms of size, data type and variables.
• Health failure clinical dataset (available on Blackboard & UCI repository).
2. Assignment Key Tasks
The following tasks are to be performed in this assignment:
i). Exploratory Data Analysis
You are required to write codes to check if there’s any data missing (if yes, apply an appropriate cleaning technique). Is there any other data preprocessing you need to conduct? If yes, write codes for this purpose. In addition, the module should have functions/methods that perform descriptive statistical analysis of the dataset (such as mean, median, standard deviation, variance, minimum, maximum, skewness and kurtosis): choose a range of the variables of your interest, find their frequencies and dependencies through bar plots, grouped bar plots, pie- charts, etc. Draw conclusions.
ii). Classification I
Split the dataset on training and testing sets. You are required to fit machine learning algorithms namely, Naïve Bayes, Logistic Regression, Support Vector Machine, Random Forest classifier, K-Nearest Neighbour and Multi-Layer Perceptron Neural Networks. Evaluate your models using test dataset and provide the confusion matrix for all models. Report and
compare performance of the models in terms of accuracy, precision, recall and F1-Score. Draw conclusions and provide recommendations.
iii). Classification II
Investigate class imbalance problem by producing the plot of the target variable class distribution. If there is presence of class imbalance problem, use at least 2 techniques to balance the class distribution (Algorithm or Sampling technique). This means you will have a balanced dataset. Using the balanced dataset, you are required to build classification models using machine learning algorithms namely, Naïve Bayes, Logistic Regression, Support Vector Machine, Random Forest classifier, K-Nearest Neighbour and Multi-Layer Perceptron Neural Networks. Evaluate your models using test dataset and provide the confusion matrix for all models. Report and compare performance of the models in terms of accuracy, precision, recall and F1-Score. Compare your result with the result of II above. Draw conclusions and provide recommendations. Please provide justification for chosen methods.
IV). Feature Selection
It is advisable to use limited features for prediction so far it produces good performances. To achieve this, you are required to investigate the significance of the features for selection purpose. Using Mann-Whitney test & Chi-Square test, you are to compare the distribution of each features between the two groups of the target class (Survived vs Dead). You can then rank the features in the most significant order (using P = 0.05). Secondly, you are to produce the plot of the feature importance graph (from Random Forest Classifier in III above). Use the plot and the statistical tests conducted to decide on features to be selected.
V). Classification III
Using the features selected in IV above, you are required to build classification models using machine learning algorithms namely, Naïve Bayes, Logistic Regression, Support Vector Machine, Random Forest classifier, K-Nearest Neighbour and Multi-Layer Perceptron Neural Networks. Evaluate your models and provide the confusion matrix for all models. Report and compare performance of the models in terms of accuracy, precision, recall and F1 -Score. Compare your result with the result of II & III above.
VI). Conclusion
Draw conclusion on your best performed model and provide justification.
DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma
Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t
Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th
1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of
1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of