Classification using Weka – Part 2

September 22, 2017 Serhii Beliablia Comments 0 Comment

Iris classification problem has been described in a previous article. If you haven’t read it, then welcome to Part 1.

In order to perform classification we have to teach our classifier. The more training data we have, the more accurate results we’ll get. I’ve taken learning data from “Machine learning repository”. It has a lot of diverse data sets for you to play with.

Iris data set contains the following information:

Sepal length in cm
Sepal width in cm
Petal length in cm
Petal width in cm
Class (Iris Setosa, Iris Versicolour, Iris Virginica)

Let’s have a look at one line of the data set:

5.1,3.5,1.4,0.2,Iris-setosa

As we can see, there are five values separated by comma. The whole data set contains 150 data samples that are used to teach a classifier.

This data set has to be transformed into .arff (Attribute-Relation File Format) format Weka works with.

@RELATION iris

@ATTRIBUTE sepallength NUMERIC
@ATTRIBUTE sepalwidth NUMERIC
@ATTRIBUTE petallength NUMERIC
@ATTRIBUTE petalwidth NUMERIC
@ATTRIBUTE class {1,2,3}

@DATA
5.1,3.5,1.4,0.2,1
4.9,3.0,1.4,0.2,1
4.7,3.2,1.3,0.2,2
4.6,3.1,1.5,0.2,3
...

After processing such configuration, classifier is ready to give us answers.

Our question looks like:

4.97,3.11,1.41,0.26,?

i.e. we have four values of characteristics and want to determine corresponding class.

In the next parts of this article I’ll show you how to process all this data using Java.

Keep code clean

Development blog to share experience

Classification using Weka – Part 2

September 22, 2017 Serhii Beliablia Comments 0 Comment

Leave a Reply Cancel reply