logo Use CA10RAM to get 10%* Discount.
Order Nowlogo
(5/5)

What is the best 𝑘 chosen (using RMSE on the validation set)?

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Problem 1: k-Nearest-Neighbors

Read the description provided in Assignment #1 – Problem 2. This problem continues our analysis in Problem 2 of Assignment #1 by building a k-NN model using the airfare dataset (i.e., the file “Airfares.xls”). 

a) Using the variables selected by your Backward Elimination operator in part (a) of Problem 2 of Assignment #1, build a 𝑘-NN model that predicts FARE. Try values of k from 1 to 10. Make sure to normalize the data. Disable “weighted vote” option and use Euclidean Distance as your distance measure. Train your model using the same training partition as in Assignment #1 (i.e., use Shuffled Sampling and local random seed value of 2022 to partition your data (60% for training and 40% for validation).

Save and export one RapidMiner process that you can use to answer the remaining parts of this problem. Name this process “FirstName1.rmp” (e.g., mine would be Alireza1.rmp). You will need to submit this process on D2L (in Dropbox folder “Individual Assignment #2”).

b) What is the best 𝑘 chosen (using RMSE on the validation set)?

c) How does the algorithm use the best 𝑘 chosen to make predictions for new records?

d) Compare the predictive accuracy of the best models in Assignment #1 (Problem 2b) and Assignment #2 (Problem 1b) using RMSE by completing the following table.

e) Why is the validation data error overly optimistic compared to the error rate when applying this 𝑘-NN predictor to new data? 

f) If the purpose is to predict FARE for hundreds of new flights, what would be the disadvantage of using 𝑘-NN prediction? List the operations that the algorithm goes through in order to produce each prediction.

g) Using your 𝑘-NN model with the best 𝑘, predict the fare on a route with the following characteristics: COUPON = 1.202, NEW = 3, VACATION = No, SW = No, HI = 4442.141, S_INCOME = $28,760, E_INCOME = $27,664, S_POP = 4,557,004, E_POP = 3,195,503, SLOT = Free, GATE = Free, PAX = 12,782, DISTANCE = 1976 miles.

(Hint: this part is to highlight since k-NN is a lazy learner, you cannot simply use a calculator (as you did in Assignment #1 for the linear regression model) to come up with a prediction. Update your Excel data by adding the new data at the bottom of the table.  After finishing your data prep (but before normalizing the data), filter this one record out using Filter Examples operator. Since your new record does not have a label (i.e., y value is missing), you can set the condition class parameter of your filter operator to “no_missing_labels”. This way the main output port of your filter gives your original dataset without the new record that you can build your model with (similar to 1a) and the unmatched port (i.e., the third port), gives you the single new record (with missing label), which you later pass to an apply model to get a prediction. Note that before you apply your k-NN model, you need to normalize this new record. You can do so by first passing the new record through another Apply Model operator and pass the normalization model (third port of the Normalize operator gives you this) to the model port of the apply model.)

 

(5/5)
Attachments:

Related Questions

. Introgramming & Unix Fall 2018, CRN 44882, Oakland University Homework Assignment 6 - Using Arrays and Functions in C

DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma

. The standard path finding involves finding the (shortest) path from an origin to a destination, typically on a map. This is an

Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class. The LineItem class will represent an individual

Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th

. SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

. Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Um e HaniScience

711 Answers

Hire Me
expert
Muhammad Ali HaiderFinance

880 Answers

Hire Me
expert
Husnain SaeedComputer science

665 Answers

Hire Me
expert
Atharva PatilComputer science

621 Answers

Hire Me
April
January
February
March
April
May
June
July
August
September
October
November
December
2025
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
SunMonTueWedThuFriSat
30
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1
2
3
00:00
00:30
01:00
01:30
02:00
02:30
03:00
03:30
04:00
04:30
05:00
05:30
06:00
06:30
07:00
07:30
08:00
08:30
09:00
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30
21:00
21:30
22:00
22:30
23:00
23:30