Problem 1: k-Nearest-Neighbors
Read the description provided in Assignment #1 – Problem 2. This problem continues our analysis in Problem 2 of Assignment #1 by building a k-NN model using the airfare dataset (i.e., the file “Airfares.xls”).
a) Using the variables selected by your Backward Elimination operator in part (a) of Problem 2 of Assignment #1, build a 𝑘-NN model that predicts FARE. Try values of k from 1 to 10. Make sure to normalize the data. Disable “weighted vote” option and use Euclidean Distance as your distance measure. Train your model using the same training partition as in Assignment #1 (i.e., use Shuffled Sampling and local random seed value of 2022 to partition your data (60% for training and 40% for validation).
Save and export one RapidMiner process that you can use to answer the remaining parts of this problem. Name this process “FirstName1.rmp” (e.g., mine would be Alireza1.rmp). You will need to submit this process on D2L (in Dropbox folder “Individual Assignment #2”).
b) What is the best 𝑘 chosen (using RMSE on the validation set)?
c) How does the algorithm use the best 𝑘 chosen to make predictions for new records?
d) Compare the predictive accuracy of the best models in Assignment #1 (Problem 2b) and Assignment #2 (Problem 1b) using RMSE by completing the following table.
e) Why is the validation data error overly optimistic compared to the error rate when applying this 𝑘-NN predictor to new data?
f) If the purpose is to predict FARE for hundreds of new flights, what would be the disadvantage of using 𝑘-NN prediction? List the operations that the algorithm goes through in order to produce each prediction.
g) Using your 𝑘-NN model with the best 𝑘, predict the fare on a route with the following characteristics: COUPON = 1.202, NEW = 3, VACATION = No, SW = No, HI = 4442.141, S_INCOME = $28,760, E_INCOME = $27,664, S_POP = 4,557,004, E_POP = 3,195,503, SLOT = Free, GATE = Free, PAX = 12,782, DISTANCE = 1976 miles.
(Hint: this part is to highlight since k-NN is a lazy learner, you cannot simply use a calculator (as you did in Assignment #1 for the linear regression model) to come up with a prediction. Update your Excel data by adding the new data at the bottom of the table. After finishing your data prep (but before normalizing the data), filter this one record out using Filter Examples operator. Since your new record does not have a label (i.e., y value is missing), you can set the condition class parameter of your filter operator to “no_missing_labels”. This way the main output port of your filter gives your original dataset without the new record that you can build your model with (similar to 1a) and the unmatched port (i.e., the third port), gives you the single new record (with missing label), which you later pass to an apply model to get a prediction. Note that before you apply your k-NN model, you need to normalize this new record. You can do so by first passing the new record through another Apply Model operator and pass the normalization model (third port of the Normalize operator gives you this) to the model port of the apply model.)
DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma
Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t
Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th
1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of
1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of