logo Use CA10RAM to get 10%* Discount.
Order Nowlogo
(5/5)

ECS708 Machine Learning What learning rate do you find best? Include a graph of the error function in your report. Note that the backpropagation can get stuck in local optima. What are the outputs and error when it gets stuck?

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

I need to complete this. I can provide the root directory files

 

Aim

The aim of this lab is to become familiar with Logistic Regression and Neural Networks.

1. Logistic Regression

From the root directory of the assignment’s code, navigate to the folder “assgn_1_part_2/1_logistic_regression”.

With logistic regression the values we want to predict are now discrete classes, not continuous variables. In other

words, logistic regression is for classification tasks. In the binary classification problem we have classes 0 and 1, e.g.

classifying email as spam or not spam based on words used in the email. The form of the hypothesis in logistic

regression is a logistic/sigmoid function given by the formula below:

This is the form of the model. The cost function for logistic regression is

which when taking partial derivatives and putting these into the gradient descent update equation gives

Fill out the sigmoid.py function by implementing the formula above. Now use the plot_sigmoid.py function to plot the

sigmoid function.

Task 1: Include in your report the relevant lines of code and the result of the using plot_sigmoid.py. [3 points]

The dataset we use is an exam score dataset with a 2-dimensional input space (for easy visualization) and a binary

classification of whether the student was admitted or not. The data can be found in:”ex4x.dat” and “ex4y.dat”. The

data has been loaded into the variables X and y. Additionally, the bias variable 1 has been added to each data point.

Run plot_data.py to plot the data. Note that the data have been normalized, to enable the data to be more easily

optimized.

Task 2. Plot the normalized data to see what it looks like. Plot also the data, without normalization. Enclose the plots

in your report. [2 points]

ECS708 Machine Learning

Assignment 1 – Part 2: Logistic Regression and Neural Networks

School of Electronic Engineering and Computer Science

1.1. Cost function and gradient for logistic regression

Task 3. Modify the calculate_hypothesis.py function so that for a given dataset, theta and training example it returns

the hypothesis. For example, for the dataset X=[[1,10,20],[1,20,30]] and for Theta = [0.5,0.6,0.7], the call to the

function calculate_hypothesis(X,theta,0) will return: sigmoid(1 * 0.5 + 10 * 0.6 + 20 * 0.7)

The function should be able to handle datasets of any size. Enclose in your report the relevant lines of code. [5 points]

Task 4. Modify the line “cost = 0.0” in compute_cost.py so that we use our cost function:

To calculate a logarithm you can use np.log(x). Now run the file assgn1_ex1.py. Tune the learning rate, if necessary.

What is the final cost found by the gradient descent algorithm? In your report include the modified code and the cost

plot.

1.2. Draw the decision boundary

Task 5. Plot the decision boundary. This corresponds to the line where , which is the boundary line’s

equation. To plot the line of the boundary, you’ll need two points of (x1,x2). Given a known value for x1, you can find

the value of x2. Rearrange the equation in terms of x2 to do that. Use the minimum and maximum values of x1 as the

known values, so that the boundary line that you’ll plot, will span across the whole axis of x1. For these values of x1,

compute the values of x2. Use the relevant plot_boundary function in assgn1_ex1.py and include the graph in your

report. [5 points]

1.3. Non-linear features and overfitting

We don’t always have access to the full dataset. In assgn1_ex2.py, the dataset has been split into a training set of 20

samples, and the rest samples have been assigned to the test set. This split has been done using the function

return_test_set.py. Gradient descent is run on the training data (this means that the parameters are learned using

only the training set and not the test set). After theta has been calculated, compute_cost() is called on both datasets

(training and test set) and the error is printed, as well as graphs of the data and boundaries shown.

Task 6. Run the code of assgn1_ex2.py several times. In every execution, the data are shuffled randomly, so you’ll see

different results. Report the costs found over the multiple runs. What is the general difference between the training

and test cost? When does the training set generalize well? Demonstrate two splits with good and bad generalisation

and put both graphs in your report.

In assgn1_ex3.py, instead of using just the 2D feature vector, incorporate the following non-linear features:

This results in a 5D input vector per data point, and so you must use 6 parameters θ. [2 points]

Task 7. Run logistic regression on this dataset. How does the error compare to the one found when using the original

features (i.e. the error found in Task 4)? Include in your report the error and an explanation on what happens. [5

points]

Task 8. In assgn1_ex4.py the data are split into a test set and a training set. Add your new features from the question

above (assgn1_ex3.py). Modify the function gradient_descent_training.py to store the current cost for the training set

and testing set. Store the cost of the training set to cost_vector_train and for the test set to cost_vector_test.

These arrays are passed to plot_cost_train_test.py, which will show the cost function of the training (in red) and test

set (in blue). Experiment with different sizes of training and test set (remember that the total data size is 80) and show

the effect of using sets of different sizes by saving the graphs and putting them in your report. In the file 

assgn1_ex5.py, add extra features (e.g. both a second-order and a third-order polynomial) and analyse the effect.

What happens when the cost function of the training set goes down but that of the test set goes up? [5 points]

Task 9. With the aid of a diagram of the decision space, explain why a logistic regression unit cannot solve the XOR

classification problem. [3 points]

2. Neural Network

We will now perform backpropagation on a feedforward network to solve the XOR problem. The network will have 2

input neurons, 2 hidden neurons and one output neuron. There is also a bias on the hidden and output layer, e.g. with

the following architecture:

!

!

(

(

(

Open!the!folder!neural_networks!and!the!file!nn_backprop.py.!Your!first!task!is!

to!modify!the! sigmoid(z)!function!to!use!your! sigmoid!function! completed!in!

question!1!above.!

!

The!program!works!in!the!following!way.!An!array!storing!the!input!patterns!and!

the!desired!outputs!is!stored!in!the!pattern!variable.!This!is!passed!to!the!train()!

function!which!also!takes!the!number!of!hidden!units!desired!(set!to!2!for!XOR),!!

the!number!of!iterations!we!should!run!the!algorithm!for!and!the!learning!rate.!

!

In!train()!the!neural!network!data!structure!is!created,!and!the!main!loop!of!the!

algorithm!is!executed.!This!works!in!the!following!way:!

For!no_iterations:!

! For!input_pattern:!

! ! Feedforward()!

! ! Backpropagate()!

!

At!for!each!input!pattern,!we!activate!the!network!by!feeding!the!input!signal!

through!the!neurons.!After!this!is!completed,!we!compare!the!actual!output!to!

the! desired! output! and! update! the! weights! using! gradient! descent! and! the!

backpropagated!error!signal.!

!

Question!10.!Implement!backpropagation.!Although!XOR!only!has!one!output,!

this!should!support!outputs!of!any!size.!I!will!now!talk!you!through!this.!

!

Step!1.!For!each!output,!k,!calculate!the!error!delta:!

! ! = (! ! − !! ) ∗ ! ′(! ! ),!

!

From the root directory of the assignment’s code, navigate to the folder “assgn_1_part_2/2_neural_network”.

Open the file xorExample.py. This is an example script that creates a neural network model as an object, initialized by

using the class defined in NeuralNetwork.py.

Your first task is to modify the function sigmoid.py to use the sigmoid function that was implemented in the previous

part, I.e. the logistic regression’s code.

The program works in the following way:

The arrays containing the input patterns (X) and the desired outputs (y) are created. These are passed to the train()

function, that can be found in the file train_scripts.py. The train() function also takes as input arguments the number

of hidden units desired (set to 2 for XOR), the number of iterations we should run the algorithm for, and the learning

rate.

In train() the neural network is created, and the main loop of the algorithm is executed. It works in the following way:

For each iteration:

For each sample:

 forward_pass()

 backward_pass()

For each input pattern, we activate the network by feeding the input signal through the neurons. After this is

completed, we compare the network’s output (i.e. the predictions), to the desired output (groundtruth labels), and

update the weights using gradient descent.

Task 10. Implement backpropagation’s code, by filling the backward_pass() function, found in NeuralNetwork.py.

Although XOR has only one output, your implementation should support outputs of any size. Do this following the

steps below: [5 points]

Step 1. For each output, k, calculate the error delta:

where yk is the response of the output neuron and tk is the desired output (target). This error, , is

multiplied by the derivative of the sigmoid function applied to the pre-sigmoided signal of the output neuron. The

derivative of sigmoid, is implemented in the sigmoid_derivative.py function. Store each error in the output_deltas

variable.

The first derivative, g’, of g(x) is: g’(x) = g(x) * (1-g(x))

As we have already calculated on the forward pass, in the code g’ is calculated using

Step 2. We now need to backpropagate this error to the hidden neurons. To accomplish this remember that:

where δj is the error on the j-th hidden neuron, xj is the value of the hidden neuron (before it has been passed through

the sigmoid function), g′ is the derivative of the sigmoid function, δk is the error from the output neuron that we have

stored in output_deltas, and wkj is the weight from the hidden neuron j to the output neuron k. Once this has been

calculated add δj to the array, hidden_deltas.

Step 3. We now need to update the output weights, i.e. the connections from the hidden neurons to the output

neurons. This is accomplished using the formula:

where wjk is the weight connecting the j-th hidden neuron to the k-th output neuron. aj is the activity of the j-th

hidden neuron (after it has been transformed by the sigmoid function), δk is the error from the output neuron stored

in output_deltas and η is the learning rate.

Step 4. Finally we need to update the hidden weights, i.e. the connections from the hidden neurons to the inputs.

Here, again we use this equation:

where wjk is the weight connecting the j-th input to the k-th hidden neuron. aj is the j-th input, δk is the

backpropagated error (i.e., hidden deltas) from the k-th hidden neuron and η is the learning rate.

2.1. Implement backpropagation on XOR

Your task is to implement backpropagation and then run the file with different learning rates (loading from

xorExample.py).

What learning rate do you find best? Include a graph of the error function in your report. Note that the

backpropagation can get stuck in local optima. What are the outputs and error when it gets stuck?

Task 11. Change the training data in xor.m to implement a different logical function, such as NOR or AND. Plot the

error function of a successful trial. [5 points]

2.2. Implement backpropagation on Iris

Now that you have implemented backpropagation we have built a powerful classifier. We will test this on the “Iris”

dataset (http://en.wikipedia.org/wiki/Iris_flower_data_set), which is a benchmark problem for classifiers. It has four

input features – sepal length, sepal width, petal length, and petal width – which are used to classify three different

species of flower.

In irisExample.py, we have taken this dataset and split it 50-50 into a training and a test set.

Task 12. The Iris data set contains three different classes of data that we need to discriminate between. How would

you accomplish this if we used a logistic regression unit? How is this scenario different, compared to the scenario of

using a neural network? [5 points]

Task 13. Run irisExample.py using the following number of hidden neurons: 1, 2, 3, 5, 7, 10. The program will plot the

costs of the training set (red) and test set (blue) at each iteration. What are the differences for each number of hidden

neurons? Which number do you think is the best to use? How well do you think that we have generalized? [5 points]

Write a report about what you have done, along with relevant plots. Save the solution in a folder with your ID.

Create and submit a .zip that contains:

1) all of your code and

2) a copy of your report. The report should be in .pdf format names as ml_assgn1_part2_StudentID.pdf

a copy of your report. The report should be in .pdf format named as ml_lab1_part2_StudentID.pdf

(5/5)
Attachments:

Related Questions

. Introgramming & Unix Fall 2018, CRN 44882, Oakland University Homework Assignment 6 - Using Arrays and Functions in C

DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma

. The standard path finding involves finding the (shortest) path from an origin to a destination, typically on a map. This is an

Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class. The LineItem class will represent an individual

Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th

. SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

. Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Um e HaniScience

612 Answers

Hire Me
expert
Muhammad Ali HaiderFinance

663 Answers

Hire Me
expert
Husnain SaeedComputer science

776 Answers

Hire Me
expert
Atharva PatilComputer science

565 Answers

Hire Me