CS5483

import logging

import numpy as np
import pandas as pd
import weka.core.jvm as jvm
import weka.plot.classifiers as plcls
from weka.classifiers import Classifier, Evaluation
from weka.core.classes import Random
from weka.core.converters import Loader

%matplotlib widget
jvm.start(logging_level=logging.ERROR)

Class imbalance problem¶

In this notebook, we will analyze a skewed dataset for detecting microcalcifications in mammograms. The goal is to build a classifier to identify whether a bright spot in a mammogram is a micro-calcification (an early sign of breast cancer).

Mammo breast cancer — Figure 1:Micro-calcification

The dataset can be downloaded from OpenML in ARFF format. The following loads the data using python-weka-wrapper.

loader = Loader(classname="weka.core.converters.ArffLoader")
data = loader.load_url("https://www.openml.org/data/download/52214/phpn1jVwe")
data.class_is_last()
print(data.summary(data))

There are 7 attributes and over 11 thousand instances. To understand the dataset, refer to Section 4 of the original paper (Woods et al. 1993):

To compute the 10-fold cross-validation accuracy for J48:

clf = Classifier(classname="weka.classifiers.trees.J48")
evl = Evaluation(data)
evl.crossvalidate_model(clf, data, 10, Random(1))

print(f"Accuracy: {evl.percent_correct:.3g}%")

You should see that the accuracy is close to 100%. To show the confusion matrix:

confusion_matrix = pd.DataFrame(
    evl.confusion_matrix,
    dtype=int,
    columns=[f'predicted class "{v}"' for v in data.class_attribute.values],
    index=[f'class "{v}"' for v in data.class_attribute.values],
)
confusion_matrix

Each row of the confusion matrix corresponds to a class value (1: malignant, -1: benign), and each column corresponds to a predicted class. Each entry is a count of instances belonging to a specific class and having a particular predicted class.

# YOUR CODE HERE
raise NotImplementedError()
print(f"Percentage of malignant detected: {percent_of_malignant_detected:.3g}%")

# tests

Different Performance Metrics¶

For a skewed dataset, one can achieve very high accuracy even by ZeroR, i.e., also predicting the class as the majority class regardless of the values of the input features. We must use other performance metrics to train and evaluate a classification algorithm properly.

To show the above metrics:

pos_class = 1  # specify the postive class value
performance = {
    "precision": evl.precision(pos_class),
    "recall": evl.recall(pos_class),
    "specificity": evl.true_negative_rate(pos_class),
}
performance

Although specificity is close to 100%, precision and recall are below 80% and 60% respectively:

If a bright spot is classified as malignant, the chance it is malignant is less than 80%.
Out of all malignant bright spots, less than 60% are identified as malignant.

The reason why close to 100% benign bright spots are identified as benign

is mainly because most bright spots are benign, but
not because the classifier can distinguish malignant bright spots from benign ones.

TP = evl.num_true_positives(pos_class)
FN = evl.num_false_negatives(pos_class)
FP = evl.num_false_positives(pos_class)
TN = evl.num_true_negatives(pos_class)

assert np.isclose(performance["precision"], TP / (TP + FP))
assert np.isclose(performance["recall"], TP / (TP + FN))
assert np.isclose(performance["specificity"], TN / (TN + FP))

TFPN = pd.DataFrame(
    [[TP, FN], [FP, TP]],
    dtype=int,
    columns=["predicted +ve", "predicted -ve"],
    index=["+ve", "-ve"],
)
TFPN

The above table is not the same as a confusion matrix since a confusion matrix

does not specify a positive class, and
can have more than two rows/columns in multi-class classification problems.

# YOUR CODE HERE
raise NotImplementedError()
print(f"negative predictive value (NPV): {performance['NPV']:.3g}")

# tests

$F_{\beta}$ -score is another measure that captures the performance in both precision and recall:

Definition 2

$F_{\beta}$ -score is defined as

\begin{align} F_{\beta} &:= \left( \frac{\precision^{-1} + \beta^2 \cdot \recall^{-1}}{\beta^2 + 1}\right)^{-1}\\ &= \frac{(\beta^2+1)\cdot \precision\cdot \recall }{\beta^2\precision + \recall}. \end{align}

(2)

$F$ -score is the special case when $\beta=1$ ,

\begin{align} F := F_1 &= \left( \frac{\precision^{-1} + \recall^{-1}}{2}\right)^{-1} \\ &= \frac{2\cdot \precision\cdot \recall }{\precision + \recall}, \end{align}

(3)

which is the harmonic mean of precision and recall.

$F$ -score is useful in training a classifier to maximize both precision and recall.

performance["F"] = evl.f_measure(pos_class)
print(f"F-score: {performance['F']:.3g}")

# YOUR CODE HERE
raise NotImplementedError()
print(f"F_2 score: {performance['F_2']:.3g}")

Exercise 4

Using ZeroR as the classifier, assign to ZeroR_performance a dictionary of precision, recall, and specificity. You can create the dictionary as follows:

ZeroR_performance = {
    'precision': ___,
    'recall': ___,
    'specificity': ___
}

Use 10-fold cross-validation with a random seed of 1. If the value is not a number, you may enter it as np.nan.

# YOUR CODE HERE
raise NotImplementedError()
ZeroR_performance

YOUR ANSWER HERE

Operating Curves for Probabilistic Classifier¶

For a probabilistic classifier that returns probabilities of different classes, we can obtain a trade-off between precision and recall by changing a threshold γ for positive prediction, i.e., predict positive if and only if the probability estimate for positive class is larger than γ.

To plot the precision-recall curve and prints the area under the curve, we can use the following tool:

import weka.plot.classifiers as plcls

plcls.plot_prc(evl, class_index=[1])
performance["PRC"] = evl.area_under_prc(pos_class)
print(f"area under precision-recall curve (PRC): {performance['PRC']:.3g}")

YOUR ANSWER HERE

We can also plot the ROC (receiver operator characteristics) curve to show the trade-off between recall (true positive rate) and false positive rate:

plcls.plot_roc(evl, class_index=[1])
performance["AUC"] = evl.area_under_roc(pos_class)
print(f"area under ROC curve (AUC): {performance['AUC']:.3g}")

YOUR ANSWER HERE

References¶

Woods, K. S., Solka, J. L., Priebe, C. E., Kegelmeyer, W. P., Doss, C. C., & Bowyer, K. W. (1994). Comparative Evaluation of Pattern Recognition Techniques for Detection of Microcalcifications in Mammography. In State of the Art in Digital Mammographic Image Analysis (pp. 213–231). WORLD SCIENTIFIC. 10.1142/9789812797834_0011

Project 1

Tutorial 5

Training with Skewed Dataset

Evaluation for Skewed Dataset

Class imbalance problem¶

Different Performance Metrics¶

Operating Curves for Probabilistic Classifier¶