Problem Formulation¶
\(\def\abs#1{\left\lvert #1 \right\rvert} \def\Set#1{\left\{ #1 \right\}} \def\mc#1{\mathcal{#1}} \def\M#1{\boldsymbol{#1}} \def\R#1{\mathsf{#1}} \def\RM#1{\boldsymbol{\mathsf{#1}}} \def\op#1{\operatorname{#1}} \def\E{\op{E}} \def\d{\mathrm{\mathstrut d}}\)
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline
SEED = 0
Mutual information estimation¶
How to formulate the problem of mutual information estimation?
The problem of estimating the mutual information is:
(MI Estimation)
Given \(n\) samples
i.i.d. drawn from an unknown probability measure \(P_{\R{X},\R{Y}}\) from the space \(\mc{X}\times \mc{Y}\), estimate the mutual information (MI)
Run the following code, which uses numpy
to
generate i.i.d. samples from a multivariate gaussian distribution, and
store the samples as numpy arrays assigned to
XY
.
# Seeded random number generator for reproducibility
XY_rng = np.random.default_rng(SEED)
# Sampling from an unknown probability measure
rho = 1 - 0.19 * XY_rng.random()
mean, cov, n = [0, 0], [[1, rho], [rho, 1]], 1000
XY = XY_rng.multivariate_normal(mean, cov, n)
plt.scatter(XY[:, 0], XY[:, 1], s=2)
plt.show()
See multivariate_normal
and scatter
.
You can also get help directly in JupyterLab:
Docstring:
Move the cursor to the object and
click
Help->Show Contextual Help
orclick
Shift-Tab
if you have limited screen space.
Directory:
Right click on a notebook and choose
New Console for Notebook
.Run
dir(obj)
for a previously defined objectobj
to see the available methods/properties ofobj
.
Exercise
What is unknown about the above sampling distribution?
Solution
The density is
but \(\rho\) is unknown (uniformly random over \([0.8,0.99)\)).
To show the data samples using pandas
:
XY_df = pd.DataFrame(XY, columns=["X", "Y"])
XY_df
To plot the data using seaborn
:
def plot_samples_with_kde(df, **kwargs):
p = sns.PairGrid(df, **kwargs)
p.map_lower(sns.scatterplot, s=2) # scatter plot of samples
p.map_upper(sns.kdeplot) # kernel density estimate for pXY
p.map_diag(sns.kdeplot) # kde for pX and pY
return p
plot_samples_with_kde(XY_df)
plt.show()
Exercise
Complete the following code by replacing the blanks ___
so that XY_ref
stores the i.i.d. samples of \((\R{X}',\R{Y}')\) where \(\R{X}'\) and \(\R{Y}'\) are zero-mean independent gaussian random variables with unit variance.
...
cov_ref, n_ = ___, n
XY_ref = XY_ref_rng.___(mean, ___, n_)
...
XY_ref_rng = np.random.default_rng(SEED)
### BEGIN SOLUTION
cov_ref, n_ = [[1, 0], [0, 1]], n
XY_ref = XY_ref_rng.multivariate_normal(mean, cov_ref, n_)
### END SOLUTION
XY_ref_df = pd.DataFrame(XY_ref, columns=["X'", "Y'"])
plot_samples_with_kde(XY_ref_df)
plt.show()
Divergence estimation¶
Can we generalize the problem further?
Estimating MI may be viewed as a special case of the following problem:
(Divergence estimation)
Estimate the KL divergence
using
a sequence \(\R{Z}^n:=(\R{Z}_1,\dots, \R{Z}_n)\sim P_{\R{Z}}^n\) of i.i.d. samples from \(P_{\R{Z}}\) if \(P_{\R{Z}}\) is unknown, and
another sequence \({\R{Z}'}^{n'}\sim P_{\R{Z}'}^{n'}\) of i.i.d. samples from \(P_{\R{Z}'}\) if \(P_{\R{Z}'}\), the reference measure of \(P_{\R{Z}}\), is also unknown.
Exercise
Although \(\R{X}^n\) and \(\R{Y}^n\) for MI estimation should have the same length, \(\R{Z}^n\) and \({\R{Z}'}^{n'}\) can have different lengths, i.e., \(n \not\equiv n'\). Why?
Solution
The dependency between \(\R{Z}\) and \(\R{Z}'\) does not affect the divergence.
Regarding the mutual information as a divergence from joint to product distributions, the problem can be further generalized to estimtate other divergences such as the \(f\)-divergence:
For a strictly convex function \(f\) with \(f(1)=0\),
\(f\)-divergence in (3) reduces to KL divergence when \(f=u \log u\):
Exercise
Show that \(D_f(P_{\R{Z}}\|P_{\R{Z}'})\geq 0\) with equality iff \(P_{\R{Z}}=P_{\R{Z}'}\) using Jensen’s inequality and the properties of \(f\).
Solution
It is a valid divergence because, by Jensen’s inequality,
with equality iff \(P_{\R{Z}}=P_{\R{Z}'}\).
Regarding the divergence as an expectation, it is approximated by the sample average:
However, this is not a valid estimate because it involves the unknown measures \(P_{\R{Z}}\) and \(P_{\R{Z}'}\).
One may further estimate the density ratio
or estimate the density defined with respective to some reference measure \(\mu\):