CS5483

import logging
import numpy as np
import weka.core.jvm as jvm
from weka.associations import Associator
from weka.core.converters import Loader

jvm.start(logging_level=logging.ERROR)

Association Rule Mining using Weka¶

We will conduct the market-basket analysis on the supermarket dataset in Weka.

Transaction data¶

Each instance of the dataset is a transaction, i.e., a customer’s purchase of items in a supermarket. The dataset can be represented as follows:

Using the Explorer interface, load the supermarket.arff dataset in Weka.

Note that most attribute contains only one possible value, namely t. Click the button Edit... to open the data editor. Observe that most attributes have missing values:

In supermarket.arff:

Each attribute specified by @attribute can be a product category, a department, or a product with one possible value t:

...
@attribute 'grocery misc' { t}
@attribute 'department11' { t}
@attribute 'baby needs' { t}
@attribute 'bread and cake' { t}
...

The last attribute 'total' has two possible values {low, high}:

@attribute 'total' { low, high} % low < 100

To understand the dataset further:

Select the Associate tab. By default, Apriori is chosen as the Associator.
Open the GenericObjectEditor and check for a parameter called treatZeroAsMissing. Hover the mouse pointer over the parameter to see more details.
Run the Apriori algorithm with different choices of the parameter treatZeroAsMissing. Observe the difference in the generated rules.

YOUR ANSWER HERE

Association rule¶

An association rule for market-basket analysis is defined as follows:

We will use python-weka-wrapper for illustration. To load the dataset:

loader = Loader(classname="weka.core.converters.ArffLoader")
weka_data_path = (
    "https://raw.githubusercontent.com/Waikato/weka-3.8/master/wekadocs/data/"
)
dataset = loader.load_url(
    weka_data_path + "supermarket.arff"
)  # use load_file to load from file instead

To apply the apriori algorithm with the default settings:

from weka.associations import Associator

apriori = Associator(classname="weka.associations.Apriori")
apriori.build_associations(dataset)
apriori

YOUR ANSWER HERE

To retrieve the rules as a list, and print the first rule:

rules = list(apriori.association_rules())
rules[0]

To obtain the set $A$ (in premise) and $B$ (in consequence):

rules[0].premise, rules[0].consequence

premise_support = rules[0].premise_support
total_support = rules[0].total_support

The apriori algorithm returns rules with large enough support:

\begin{align} \op{support}(A \implies B) &= \op{support}(A \cup B) := \frac{\op{count}(A \cup B)}{|D|}\quad \text{where}\\ \op{count(A \cup B)} &:= \abs{\Set{T\in D|T\supseteq A\cup B}}. \end{align}

(4)

Support is the fraction of transactions containing $A$ and $B$ .

For the first rule, the number 723 at the end of the rule corresponds to the total support count $\op{count}(A\cup B)$ .

# YOUR CODE HERE
raise NotImplementedError()
support

<conf:(0.92)> lift:(1.27) lev:(0.03) conv:(3.35) printed after the first rule indicates that

confidence is used for ranking the rules and
the rule has a confidence of 0.92.

By default, the rules are ranked by confidence, which is defined as follows:

In python-weka-wrapper, we can print different metrics as follows:

for n, v in zip(rules[0].metric_names, rules[0].metric_values):
    print(f"{n}: {v:.3g}")

# YOUR CODE HERE
raise NotImplementedError()
premise_support

Lift is another rule quality measure defined as follows:

Definition 4

The lift of a rule is

\begin{align} \op{lift}(A\implies B) &:= \frac{\op{confidence}(A\implies B)}{\op{support(B)}} = \frac{\op{support(A \cup B)}}{\op{support(A)}\op{support(B)}}\\ &= \frac{\op{confidence}(A\implies B)}{\op{confidence}(\emptyset \implies B)}. \end{align}

(6)

where the last equality is obtained by rewriting $\op{support}(B)$ in the denominator of the first equality as

\begin{align} \op{confidence}(\emptyset \implies B) &= \frac{\op{support}(B)}{\op{support}(\emptyset)} = \op{support}(B). \end{align}

(7)

In other words, lift is the fractional increase in confidence by imposing the premise.

apriori_lift = Associator(classname="weka.associations.Apriori", options=['-T', '1'])
...

where the value 1 corresponds to Lift.

# YOUR CODE HERE
raise NotImplementedError()
lift

YOUR ANSWER HERE

Frequent-Pattern Analysis

Association Rule Mining using Weka¶

Transaction data¶

Association rule¶