This homework deals with Bean data that we used in HW1. Each Bean is characterized by its 9 attributes:
1.) Area (A): The area of a bean zone and the number of pixels within its boundaries. 2.) Perimeter (P): Bean circumference is defined as the length of its border.
3.) Major axis length (L): The distance between the ends of the longest line that can be drawn from a bean.
4.) Minor axis length (l): The longest line that can be drawn from the bean while standing perpendicular to the main axis.
5.) Aspect ratio (K): Defines the relationship between L and l.
6.) Eccentricity (Ec): Eccentricity of the ellipse having the same moments as the region.
7.) Convex area (C): Number of pixels in the smallest convex polygon that can contain the area of a bean seed.
8.) Equivalent diameter (Ed): The diameter of a circle having the same area as a bean seed area. 9.) Class (Seker, Barbunya, Bombay, Cali, Dermosan, Horoz and Sira)
Using this data, answer the following questions.
Since the data has numerical values, first discretize them into enumerated type using the following scheme. Here, each value is divided into four types: LOW, MED, HIGH and V.HIGH, and range for each specified in the below table.
Category A (in K) P L I K Ec C (in K) Ed
LOW <50 <750 <250 <175 <1.25 <0.55 <30 <200
MED 50-80 750-1K 250-350 175-225 1.25-1.50 0.55-0.7 30-60 200-300
HIGH 80-100 1K-1200 350-500 225-300 1.50-1.75 0.7-0.8 60-100 300-400
V.HIGH >100 >1200 >500 >300 >1.75 >0.8 >100 >400
You are provided with the following training data.
A P L I K Ec C Ed Class
31823 663 223 182 1.23 0.58 32274 201 SEKER
27275 605 220 158 1.39 0.69 27604 186 DERMASON
32799 654 220 190 1.16 0.50 33087 204 SEKER
58434 981 396 190 2.09 0.88 59309 273 HOROZ
68513 1015 359 244 1.47 0.73 69406 295 BARBUNYA
85702 1107 428 257 1.66 0.80 86542 330 CALI
137358 1365 508 345 1.47 0.73 138093 418 BOMBAY
41643 769 295 181 1.63 0.79 42233 230 SIRA
68551 1025 356 246 1.45 0.72 69684 295 BARBUNYA
137115 1427 519 337 1.54 0.76 138970 418 BOMBAY
27277 605 218 159 1.37 0.68 27611 186 DERMASON
41646 762 286 186 1.53 0.76 42074 230 SIRA
85666 1119 436 251 1.73 0.82 86305 330 CALI
58454 965 392 196 2.00 0.87 60280 273 HOROZ
58484 956 382 197 1.94 0.86 59456 273 HOROZ
41646 768 288 186 1.55 0.76 42225 230 SIRA
27267 597 215 162 1.33 0.66 27575 186 DERMASON
Q1. Using manual computations (show your work), answer the following. SHOW YOUR WORK.
a. With Bean class as the outcome, and A,P, K, and ED as attributes, determine a rudimentary rule using 1R. Clearly state the final rule (with LHS and RHS) along with the related error.
b. Using Naïve Bayes technique, build the probability table (similar to Table 4.2).
c. Build a decision tree with Bean class as the outcome and A,P, K, and Ed as the attributes. (Show your work)
d. Using the above three models, predict the outcome for the following six items
A P K Ed
T1 43234 450 1.45 450
T2 65897 1390 1.10 155
T3 95678 890 1.65 280
T4 123678 1189 1.87 390
T5 45900 655 1.35 295
T6 89456 1035 1.75 305
Q2. Answer Q1 above by running Weka using 1R (rudimentary) and J48 (decision tree) algorithms.
What to submit? Submit a pdf file with your answers via the Canvas Your output should look like this:
Name Course HW# Q1. Work and results for Q1
Q2. Weka output for Q2
Summary of test runs (for the six unknown items)
DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma
Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t
Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th
1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of
1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of