Weka is tool which is used by students or beginners for implementing Machine Learning Algorithms. Weka KNN Classifier has been also included in the latest versions of Weka. KNN also known as K – Nearest Neighbor is general purpose classification algorithm and widely applied in machine learning, data mining and data science problem. KNN classifies the given test by counting the votes from nearest neighbor. These neighbors are selected on the bases of any similarity. The similarity aggregate may be counted on common attributes or by Euclidean distance formula (Depending upon the data type). KNN is supervised learning model. Weka KNN Classifier has been explained step by step in following procedure.
Weka KNN Classifier
KNN Plot (taken from datacamp.com)
There are two kinds of group in this plot, stars and pyramid. New test instance is introduced and nearest neighbors are 3. Pyramids are mode than stars. So the new shape is pyramid.
Number of neighbors are very important. They must be odd in number and can be found randomly, or taking under root of instances.
Implementation in Weka
We are going to implement KNN in Weka tool. Steps for implementation are given below.
Step 1: Opening Weka Tool (Explorer Window)
Step 2) Import the data into Weka by opening the new file. I have imported sample data already found in weka. You can also import your own csv.
Step 3) After importing the data, move to classify tab and choose the classifier “IBK”.
Step 4) Select the attribute which we want to detect and start building model.
Step 5) Calculate the results from “Classifier Output Window”.
=== Run information ===
Scheme:weka.classifiers.lazy.IBk -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\""
Relation: pima_diabetes
Instances: 768
Attributes: 9
preg
plas
pres
skin
insu
mass
pedi
age
class
Test mode:10-fold cross-validation
=== Classifier model (full training set) ===
IB1 instance-based classifier
using 1 nearest neighbour(s) for classification
Time taken to build model: 0 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 539 70.1823 %
Incorrectly Classified Instances 229 29.8177 %
Kappa statistic 0.3304
Mean absolute error 0.2988
Root mean squared error 0.5453
Relative absolute error 65.7327 %
Root relative squared error 114.3977 %
Total Number of Instances 768
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.794 0.47 0.759 0.794 0.776 0.65 tested_negative
0.53 0.206 0.58 0.53 0.554 0.65 tested_positive
Weighted Avg. 0.702 0.378 0.696 0.702 0.698 0.65
=== Confusion Matrix ===
a b <-- classified as
397 103 | a = tested_negative
126 142 | b = tested_positive
- Accuracy of model is 70.1823%