Classification using Weka – Part 2

Classification using Weka – Part 2

Iris classification problem has been described in a previous article. If you haven’t read it, then welcome to Part 1.


In order to perform classification we have to teach our classifier. The more training data we have, the more accurate results we’ll get. I’ve taken learning data from “Machine learning repository”. It has a lot of diverse data sets for you to play with.

Iris data set contains the following information:

  1. Sepal length in cm
  2. Sepal width in cm
  3. Petal length in cm
  4. Petal width in cm
  5. Class (Iris Setosa, Iris Versicolour, Iris Virginica)

Let’s have a look at one line of the data set:

5.1,3.5,1.4,0.2,Iris-setosa

As we can see, there are five values separated by comma. The whole data set contains 150 data samples that are used to teach a classifier.

This data set has to be transformed into .arff (Attribute-Relation File Format) format Weka works with.

@RELATION iris

@ATTRIBUTE sepallength NUMERIC
@ATTRIBUTE sepalwidth NUMERIC
@ATTRIBUTE petallength NUMERIC
@ATTRIBUTE petalwidth NUMERIC
@ATTRIBUTE class {1,2,3}

@DATA
5.1,3.5,1.4,0.2,1
4.9,3.0,1.4,0.2,1
4.7,3.2,1.3,0.2,2
4.6,3.1,1.5,0.2,3
...

After processing such configuration, classifier is ready to give us answers.

Our question looks like:

4.97,3.11,1.41,0.26,?

i.e. we have four values of characteristics and want to determine corresponding class.


In the next parts of this article I’ll show you how to process all this data using Java.

Leave a Reply

Your email address will not be published. Required fields are marked *