
In this notebook, you will compete with your classmates and your machine by
- handcrafting a decision tree using Weka
UserClassifier, and - using
python-weka-wrapperto build the J48 (C4.5) decision tree as a comparison.
Let’s find out who is more intelligent!
Interactive Decision Tree Construction¶
Follow the instruction above [Witten11] Ex 17.2.12 to
- install the package
UserClassifier, - hand-build a decision tree using
segment-challenge.arffas the training set, and - test the performance using
segment-test.arffas the test set.
YOUR ANSWER HERE
YOUR ANSWER HERE
YOUR ANSWER HERE
YOUR ANSWER HERE
Python Weka Wrapper¶
To see if your hand-built classifier can beat the machine, use J48 (C4.5) to build a decision tree. Instead of using the Weka Explorer Interface, you will run Weka directly from the notebook using python-weka-wrapper.
Tip
For your group project, you can also create your own conda environment to install additional packages such as
myenv=myenvname
cat <<EOF > /tmp/myenv.yaml && mamba env create -n "${myenv}" -f /tmp/myenv.yaml
dependencies:
- python=3.10
- pip
- ipykernel
- pandas
- numpy
- pip:
- python-weka-wrapper3
- python-javabridge
- graphviz
EOFAfterwards, you can create a kernel using the command:
conda activate ${myenv}
python -m ipykernel install \
--user \
--name "${myenv}" --display-name "${myenv}"Reload the browser window for the kernel to take effect. See the documentation for more details.
To deactivate the conda environment in a terminal, run
conda deactivateTo delete the kernel, run the command
rm -rf ~/.local/share/jupyter/kernels/${myenv}To delete the conda environment, run
conda deactivate
mamba env remove -n ${myenv}Because Weka is written in Java, we need to start the java virtual machine first.
import weka.core.jvm as jvm
import logging
jvm.start(logging_level=logging.ERROR)Loading dataset¶
To load the dataset, create an ArffLoader as follows:
from weka.core.converters import Loader
loader = Loader(classname="weka.core.converters.ArffLoader")The loader has the method load_url to load data from the web, such as the Weka GitHub repository:
weka_data_path = (
"https://raw.githubusercontent.com/Waikato/weka-3.8/master/wekadocs/data/"
)
trainset = loader.load_url(
weka_data_path + "segment-challenge.arff"
) # use load_file to load from file insteadFor classification, we have to specify the class attribute. For instance, the method class_is_last mutates trainset to have the last attribute as the class attribute:
trainset.class_is_last()from weka.core.dataset import Instances
# YOUR CODE HERE
raise NotImplementedError()
print(Instances.summary(testset))# tests
assert testset.relationname == "segment"
assert testset.num_instances == 810
assert testset.num_attributes == 20Training using J48¶
To train a decision tree using J48, we create the classifier and then apply the method build_classifier on the training set.
from weka.classifiers import Classifier
J48 = Classifier(classname="weka.classifiers.trees.J48")
J48.build_classifier(trainset)
J48To visualize the tree:
import graphviz
graphviz.Source(J48.graph)Tip
J48.graph is a piece of code written in a domain-specific language called DOT graph. For your group project, you may want to save the dot file instead of the rendered image, so that you can edit it further. To do so:
Save the string to a text file such as
J48tree.gvEdit/preview it in vscode using the extension. To install the extension:
- Run the command in a terminal:
install-vscode-extension tintinweb.graphviz-interactive-preview@0.3.5- Reload the vscode window with the command
> Developer: Reload Window.
There are also online editors available such as:
Evaluation¶
To evaluate the decision tree on the training set:
from weka.classifiers import Evaluation
J48train = Evaluation(trainset)
J48train.test_model(J48, trainset)
train_accuracy = J48train.percent_correct
print(f"Training accuracy: {train_accuracy:.4g}%")# YOUR CODE HERE
raise NotImplementedError()
print(f"Test accuracy: {test_accuracy:.4g}%")YOUR ANSWER HERE
To stop the Java virtual machine, run the following line. To restart jvm, you must restart the kernel.
jvm.stop()