CS5483

In this notebook, you will compete with your classmates and your machine by

handcrafting a decision tree using Weka UserClassifier, and
using python-weka-wrapper to build the J48 (C4.5) decision tree as a comparison.

Let’s find out who is more intelligent!

Interactive Decision Tree Construction¶

Follow the instruction above [Witten11] Ex 17.2.12 to

install the package UserClassifier,
hand-build a decision tree using segment-challenge.arff as the training set, and
test the performance using segment-test.arff as the test set.

YOUR ANSWER HERE

Exercise 3

Include the model and result summary sections from the result buffer of your best hand-built decision tree. Your answer should look like:

=== Classifier model (full training set) ===

Split on ...

=== Summary ===

Correctly Classified Instances ...

Try your best to beat your classmates and the machine:

Build at least two decision trees and pick the best one.
Share your result on the discussion page and check if your classmates have a better decision tree.

YOUR ANSWER HERE

Python Weka Wrapper¶

To see if your hand-built classifier can beat the machine, use J48 (C4.5) to build a decision tree. Instead of using the Weka Explorer Interface, you will run Weka directly from the notebook using python-weka-wrapper.

Tip

For your group project, you can also create your own conda environment to install additional packages such as

myenv=myenvname
cat <<EOF > /tmp/myenv.yaml && mamba env create -n "${myenv}" -f /tmp/myenv.yaml
dependencies:
  - python=3.10
  - pip
  - ipykernel
  - pandas
  - numpy
  - pip:
    - python-weka-wrapper3
    - python-javabridge
    - graphviz
EOF

Afterwards, you can create a kernel using the command:

conda activate ${myenv}
python -m ipykernel install \
    --user \
    --name "${myenv}" --display-name "${myenv}"

Reload the browser window for the kernel to take effect. See the documentation for more details.

To deactivate the conda environment in a terminal, run

conda deactivate

To delete the kernel, run the command

rm -rf ~/.local/share/jupyter/kernels/${myenv}

To delete the conda environment, run

conda deactivate
mamba env remove -n ${myenv}

Because Weka is written in Java, we need to start the java virtual machine first.

import weka.core.jvm as jvm
import logging

jvm.start(logging_level=logging.ERROR)

Loading dataset¶

To load the dataset, create an ArffLoader as follows:

from weka.core.converters import Loader

loader = Loader(classname="weka.core.converters.ArffLoader")

The loader has the method load_url to load data from the web, such as the Weka GitHub repository:

weka_data_path = (
    "https://raw.githubusercontent.com/Waikato/weka-3.8/master/wekadocs/data/"
)
trainset = loader.load_url(
    weka_data_path + "segment-challenge.arff"
)  # use load_file to load from file instead

For classification, we have to specify the class attribute. For instance, the method class_is_last mutates trainset to have the last attribute as the class attribute:

trainset.class_is_last()

from weka.core.dataset import Instances

# YOUR CODE HERE
raise NotImplementedError()
print(Instances.summary(testset))

# tests
assert testset.relationname == "segment"
assert testset.num_instances == 810
assert testset.num_attributes == 20

Training using J48¶

To train a decision tree using J48, we create the classifier and then apply the method build_classifier on the training set.

from weka.classifiers import Classifier

J48 = Classifier(classname="weka.classifiers.trees.J48")
J48.build_classifier(trainset)
J48

To visualize the tree:

import graphviz
graphviz.Source(J48.graph)

Tip

J48.graph is a piece of code written in a domain-specific language called DOT graph. For your group project, you may want to save the dot file instead of the rendered image, so that you can edit it further. To do so:

Save the string to a text file such as J48tree.gv
Edit/preview it in vscode using the extension. To install the extension:
1. Run the command in a terminal:
```
install-vscode-extension tintinweb.graphviz-interactive-preview@0.3.5
```
1. Reload the vscode window with the command > Developer: Reload Window.

There are also online editors available such as:

open in new tab

Evaluation¶

To evaluate the decision tree on the training set:

from weka.classifiers import Evaluation

J48train = Evaluation(trainset)
J48train.test_model(J48, trainset)
train_accuracy = J48train.percent_correct
print(f"Training accuracy: {train_accuracy:.4g}%")

# YOUR CODE HERE
raise NotImplementedError()
print(f"Test accuracy: {test_accuracy:.4g}%")

YOUR ANSWER HERE

To stop the Java virtual machine, run the following line. To restart jvm, you must restart the kernel.

jvm.stop()

Man vs Machine

Interactive Decision Tree Construction¶

Python Weka Wrapper¶

Loading dataset¶

Training using J48¶

Evaluation¶