logo Use CA10RAM to get 10%* Discount.
Order Nowlogo
(5/5)

Evaluating the quality of ML classification algorithms for 17 different classifiers from Spark ML, Keras, and Scikit-learn to detect or minimize

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Evaluating the quality of ML classification algorithms for 17 different classifiers from Spark ML, Keras, and Scikit-learn to detect or minimize ML bugs at an early stage before a model is deployed. This can be achieved by testing the Code, the Model, and the Data and evaluating the individual classifiers using ML quality attributes using three popular classification datasets

 

The goal is how open-source ML systems should be tested using the state of art solutions i.e model behavioral testing to build user confidence in using these systems in operational systems

 

The implementation of th project requires addressing several questions

Our objective is to determine ways to improve ML systems quality by testing the data, model, and code using quantitative and qualitative metrics. These metrics are performance, reproducibility, correctness, robustness, explainability. Therefore, the experimentation(implementation) must be able to answer the following questions. Hence, you need to think how to design to address each question

 

What are the most appropriate or ideal classifiers for the problem at hand, what are the most effective evaluation metrics, and what makes the classifier perform best?

Precision, Recall, Accuracy, ROC, confusion matrix, classification report

Which classifiers are robust enough for data transformations such as data shuffling of the training instance, adding adversarial examples, and scaled data? Or Which classifier is robust to slight changes in the input data or for synthetic datasets? Also,

What are the main factors or parameters that contribute to sensitivity?

 

What are methods or model parameters(if any) to make the black box decision-making process more explainable and which classifier output is explainable and interpretable? i.e

Explainability in their native form (without using any explainability tools) and using explainability tools(SHAP and LIME)

For example, decision making of decision tree is easy to understand at a high-level(if-else statement)

How do the input features contribute to the model output?

What are the main factors/parameters/methods that enhance ML reproducibility and why is model reproducibility difficult to achieve? And

Which classifier is reproducible and why?

What are the most appropriate classifiers, ideal performance metrics for the raw data (without any transformation), unnormalized data(cleaned and transformed data but not normalized), normalized data(cleaned, transformed, and normalized), and imbalanced data?

Which combination of qualitative and quantitative metrics: performance, robustness, correctness, reproducibility, and explainability should ML practitioners consider or give priority to getting a holistic view of the model behavior before they deploy a model?

Why does accuracy alone not provide a complete picture of the model?

Is it possible to tell how each of the metrics correlates with the classifiers?

For example, does a decision tree classifier emphasize robustness over explainability?

Do we get the same results using various classification models by applying the same data processing, the same or similar model parameters/hyperparameters, and other settings remaining the same?

We need to provide valid reasons for both yes and no answers

What are the unique challenges of model behavioral testing when applied to classification models from Scikit-Learn, Keras, and Spark ML?

How can we adjust the workflow to handle data and concepts drifts?

 

It is possible that data and concepts, i.e the target instance may change over time and affect the quality of the ML system (model quality), for example, the prediction power. What are the best practices to minimize this effect?

The analysis must be performed between classifiers within the same library and between libraries(focus should be here). For example,

Scikit-Learn linear SVM with other classifiers in Scikit-Learn and with Spark ML Linear SVM classifiers

We have 17 classifiers/algorithms to be evaluated or analyzed

Spark ML = 8, Scikit-Learn = 8, Keras =1 Tools and Technical composition

Programming Language and IDE: Python, Jupyter Notebook

Development OS: Ubuntu ( as long we using Juyper no problem)

Development Approach: Test-Driven Development

Program constructs: Classes and Functions (I love functions)

Required skillsets: the project is quite challenging

Someone who have done projects in:

ML, ML Testing and quality assurance,EDA, ML Workflow orchestration(tracking), ML Model Behavioral testing ,etc

ML model properties to be evaluated practically

Performance, robustness, reproducibility, correctness, explainability and Interpretability

ML Frameworks to used for the implementation: Spark ML, Scikit-Learn, and Keras

It is very crucial to clearly understand the differences between Model Evaluation and Model Testing as well as ML evaluation and ML testing

The primary focus of this projecti model testing, not model evaluation

Model evaluation using performance metrics mainly depends on the performance metrics of the model, whereas model testing is quite far beyond that Writing Test Cases for Machine Learning systems - Analytics Vidhya

 

Classification Algorithms

 

We have selected 16 classifiers that have the same mathematical intuition both in Spark MLi.e PySpark and Scikit-Learn. We also have one general classifier from Keras. We want to evaluate the Keras classifier with the rest of the classifiers. The classification algorithms are: Note each algorithm havs two version: 1 from Pyspark and 1 from

Scikit-Learn

 

Linear SVC or SVM.SVC(2)

Max iterations, C = regularization strength, penalty(loss), fit intercept,random state

Logistic Regression(2)

solver, penalty, C (regularization strength), max iteration, random state

Decision Trees(2) & Random Forest(2)

Max depth, number of estimators, impurity measure, max features, bootstrap technique, random state

Gaussian Naive Bayes (2)

Smoothing, model type(Multi, Gaussian, Bernoulli)

The APIs do not provide many hyperparameters as it generalizes well

GBTClassifier(Scikit-Learn)(1) and GBTClassifier (spark ML) (1)

Max features, number of estimators, max depth, learning rate, loss/loss type, bootstrap technique, random state

MLPClassifier(2)

hidden layer size, activation, solver, max iterations, learning rate, batch size, alpha(regularization), random state

One-vs-Rest(2)

estimator(baseline estimator), number of parallel jobs

Keras Classifier(1)

Binary or multi-class general classifier, random state

HyperParameter selection

 

The hyperparameters should be selected in such a way that they will significantly contribute to the quality of MLmodels. Considering three to four hyperparameters for each classifier seems reasonable.

The set of hyperparameters should be present equally in Scikit-Learn and Spark ML. Otherwise, the comparison would not be reasonable

 

 

(5/5)
Attachments:

Related Questions

. Introgramming & Unix Fall 2018, CRN 44882, Oakland University Homework Assignment 6 - Using Arrays and Functions in C

DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma

. The standard path finding involves finding the (shortest) path from an origin to a destination, typically on a map. This is an

Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class. The LineItem class will represent an individual

Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th

. SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

. Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Um e HaniScience

650 Answers

Hire Me
expert
Muhammad Ali HaiderFinance

506 Answers

Hire Me
expert
Husnain SaeedComputer science

560 Answers

Hire Me
expert
Atharva PatilComputer science

943 Answers

Hire Me