{ "cells": [ { "cell_type": "markdown", "id": "82b4ca39", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Problem Formulation" ] }, { "cell_type": "markdown", "id": "21fce64f", "metadata": {}, "source": [ "$\\def\\abs#1{\\left\\lvert #1 \\right\\rvert}\n", "\\def\\Set#1{\\left\\{ #1 \\right\\}}\n", "\\def\\mc#1{\\mathcal{#1}}\n", "\\def\\M#1{\\boldsymbol{#1}}\n", "\\def\\R#1{\\mathsf{#1}}\n", "\\def\\RM#1{\\boldsymbol{\\mathsf{#1}}}\n", "\\def\\op#1{\\operatorname{#1}}\n", "\\def\\E{\\op{E}}\n", "\\def\\d{\\mathrm{\\mathstrut d}}$" ] }, { "cell_type": "code", "execution_count": null, "id": "e3a8f07c", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns\n", "\n", "%matplotlib inline\n", "SEED = 0" ] }, { "cell_type": "markdown", "id": "f4bd8b99", "metadata": {}, "source": [ "## Mutual information estimation" ] }, { "cell_type": "markdown", "id": "5515c02a", "metadata": {}, "source": [ "**How to formulate the problem of mutual information estimation?**" ] }, { "cell_type": "markdown", "id": "f524313a", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The problem of estimating the mutual information is:" ] }, { "cell_type": "markdown", "id": "edf98aa6", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "````{prf:definition} MI Estimation \n", ":label: mi-estimation\n", "\n", "Given $n$ samples\n", "\n", "$$(\\R{X}_1,\\R{Y}_1),\\dots, (\\R{X}_n,\\R{Y}_n) \\stackrel{iid}{\\sim} P_{\\R{X},\\R{Y}}\\in \\mc{P}(\\mc{X},\\mc{Y})$$ \n", "\n", "i.i.d. drawn from an *unknown* probability measure $P_{\\R{X},\\R{Y}}$ from the space $\\mc{X}\\times \\mc{Y}$, estimate the *mutual information (MI)*\n", "\n", "$$\n", "\\begin{align}\n", "I(\\R{X}\\wedge\\R{Y}) &:= E\\left[\\log \\frac{d P_{\\R{X},\\R{Y}}(\\R{X},\\R{Y})}{d (P_{\\R{X}} \\times P_{\\R{Y}})(\\R{X},\\R{Y})} \\right].\n", "\\end{align}\n", "$$ (MI)\n", "\n", "````" ] }, { "cell_type": "markdown", "id": "32b9e401", "metadata": {}, "source": [ "Run the following code, which uses `numpy` to \n", "- generate i.i.d. samples from a multivariate gaussian distribution, and\n", "- store the samples as numpy arrays assigned to `XY`." ] }, { "cell_type": "code", "execution_count": null, "id": "08d38624", "metadata": {}, "outputs": [], "source": [ "# Seeded random number generator for reproducibility\n", "XY_rng = np.random.default_rng(SEED)\n", "\n", "# Sampling from an unknown probability measure\n", "rho = 1 - 0.19 * XY_rng.random()\n", "mean, cov, n = [0, 0], [[1, rho], [rho, 1]], 1000\n", "XY = XY_rng.multivariate_normal(mean, cov, n)\n", "plt.scatter(XY[:, 0], XY[:, 1], s=2)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "5898ec3b", "metadata": {}, "source": [ "See [`multivariate_normal`][mn] and [`scatter`][sc].\n", "\n", "[mn]: https://numpy.org/doc/stable/reference/random/generated/numpy.random.multivariate_normal.html\n", "[sc]: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html" ] }, { "cell_type": "markdown", "id": "bdd96ce5", "metadata": {}, "source": [ "You can also get help directly in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html):" ] }, { "cell_type": "markdown", "id": "cb6eebde", "metadata": {}, "source": [ "- **Docstring**: \n", " - Move the cursor to the object and \n", " - click `Help->Show Contextual Help` or\n", " - click `Shift-Tab` if you have limited screen space." ] }, { "cell_type": "markdown", "id": "7d1ac0fc", "metadata": {}, "source": [ "- **Directory**:\n", " - Right click on a notebook and choose `New Console for Notebook`. \n", " - Run `dir(obj)` for a previously defined object `obj` to see the available methods/properties of `obj`." ] }, { "cell_type": "markdown", "id": "8a574ce2", "metadata": {}, "source": [ "**Exercise** \n", "\n", "What is unknown about the above sampling distribution?" ] }, { "cell_type": "markdown", "id": "a8002062", "metadata": { "nbgrader": { "grade": true, "grade_id": "unknown-distribution", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false }, "tags": [ "hide-cell" ] }, "source": [ "````{toggle}\n", "**Solution**\n", "\n", "The density is\n", "\n", "$$\n", "\\frac{d P_{\\R{X},\\R{Y}}}{dxdy} = \\mc{N}_{\\M{0},\\left[\\begin{smallmatrix}1 & \\rho \\\\ \\rho & 1\\end{smallmatrix}\\right]}(x,y)\n", "$$\n", "\n", "but $\\rho$ is unknown (uniformly random over $[0.8,0.99)$).\n", "\n", "````" ] }, { "cell_type": "markdown", "id": "ea8cd1cd", "metadata": {}, "source": [ "To show the data samples using `pandas`:" ] }, { "cell_type": "code", "execution_count": null, "id": "12bd6f11", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [], "source": [ "XY_df = pd.DataFrame(XY, columns=[\"X\", \"Y\"])\n", "XY_df" ] }, { "cell_type": "markdown", "id": "5a497538", "metadata": {}, "source": [ "To plot the data using `seaborn`:" ] }, { "cell_type": "code", "execution_count": null, "id": "34a04ebf", "metadata": {}, "outputs": [], "source": [ "def plot_samples_with_kde(df, **kwargs):\n", " p = sns.PairGrid(df, **kwargs)\n", " p.map_lower(sns.scatterplot, s=2) # scatter plot of samples\n", " p.map_upper(sns.kdeplot) # kernel density estimate for pXY\n", " p.map_diag(sns.kdeplot) # kde for pX and pY\n", " return p\n", "\n", "\n", "plot_samples_with_kde(XY_df)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "6fc79481", "metadata": {}, "source": [ "**Exercise** \n", "\n", "Complete the following code by replacing the blanks `___` so that `XY_ref` stores the i.i.d. samples of $(\\R{X}',\\R{Y}')$ where $\\R{X}'$ and $\\R{Y}'$ are zero-mean independent gaussian random variables with unit variance.\n", "\n", "```Python\n", "...\n", "cov_ref, n_ = ___, n\n", "XY_ref = XY_ref_rng.___(mean, ___, n_)\n", "...\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "09d75ce7", "metadata": { "nbgrader": { "grade": false, "grade_id": "sampling", "locked": false, "schema_version": 3, "solution": true, "task": false }, "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "XY_ref_rng = np.random.default_rng(SEED)\n", "### BEGIN SOLUTION\n", "cov_ref, n_ = [[1, 0], [0, 1]], n\n", "XY_ref = XY_ref_rng.multivariate_normal(mean, cov_ref, n_)\n", "### END SOLUTION\n", "XY_ref_df = pd.DataFrame(XY_ref, columns=[\"X'\", \"Y'\"])\n", "plot_samples_with_kde(XY_ref_df)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "ebf74bf6", "metadata": { "tags": [] }, "source": [ "## Divergence estimation" ] }, { "cell_type": "markdown", "id": "fe24a6fb", "metadata": {}, "source": [ "**Can we generalize the problem further?**" ] }, { "cell_type": "markdown", "id": "ccc129bd", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "Estimating MI may be viewed as a special case of the following problem:" ] }, { "cell_type": "markdown", "id": "ef60218e", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "````{prf:definition} Divergence estimation \n", ":label: D-estimation\n", "\n", "Estimate the KL *divergence*\n", "\n", "$$\n", "\\begin{align}\n", "D(P_{\\R{Z}}\\|P_{\\R{Z}'}) &:= E\\left[\\log \\frac{d P_{\\R{Z}}(\\R{Z})}{d P_{\\R{Z}'}(\\R{Z})} \\right]\n", "\\end{align}\n", "$$ (D)\n", "\n", "using \n", "- a sequence $\\R{Z}^n:=(\\R{Z}_1,\\dots, \\R{Z}_n)\\sim P_{\\R{Z}}^n$ of i.i.d. samples from $P_{\\R{Z}}$ if $P_{\\R{Z}}$ is unknown, and\n", "- another sequence ${\\R{Z}'}^{n'}\\sim P_{\\R{Z}'}^{n'}$ of i.i.d. samples from $P_{\\R{Z}'}$ if $P_{\\R{Z}'}$, the *reference measure* of $P_{\\R{Z}}$, is also unknown.\n", "\n", "````" ] }, { "cell_type": "markdown", "id": "33580309", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "source": [ "**Exercise** \n", "\n", "Although $\\R{X}^n$ and $\\R{Y}^n$ for MI estimation should have the same length, $\\R{Z}^n$ and ${\\R{Z}'}^{n'}$ can have different lengths, i.e., $n \\not\\equiv n'$. Why?" ] }, { "cell_type": "markdown", "id": "7de3f91c", "metadata": { "nbgrader": { "grade": true, "grade_id": "Z-Z_ref", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false }, "slideshow": { "slide_type": "fragment" }, "tags": [ "hide-cell" ] }, "source": [ "````{toggle}\n", "**Solution** \n", "\n", "The dependency between $\\R{Z}$ and $\\R{Z}'$ does not affect the divergence.\n", "\n", "````" ] }, { "cell_type": "markdown", "id": "0c544e10", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "Regarding the mutual information as a divergence from joint to product distributions, the problem can be further generalized to estimtate other divergences such as the $f$-divergence:" ] }, { "cell_type": "markdown", "id": "77c2a9a5", "metadata": {}, "source": [ "For a strictly convex function $f$ with $f(1)=0$,\n", "\n", "$$\n", "\\begin{align}\n", "D_f(P_{\\R{Z}}\\|P_{\\R{Z}'}) &:= E\\left[ f\\left(\\frac{d P_{\\R{Z}}(\\R{Z}')}{d P_{\\R{Z}'}(\\R{Z}')}\\right) \\right].\n", "\\end{align}\n", "$$ (f-D)" ] }, { "cell_type": "markdown", "id": "4b4f5fef", "metadata": {}, "source": [ "$f$-divergence in {eq}`f-D` reduces to KL divergence when $f=u \\log u$:\n", "\n", "$$\n", "\\begin{align}\n", "E\\left[ \\frac{d P_{\\R{Z}}(\\R{Z}')}{d P_{\\R{Z}'}(\\R{Z}')} \\log \\frac{d P_{\\R{Z}}(\\R{Z}')}{d P_{\\R{Z}'}(\\R{Z}')} \\right] &= \\int_{\\mc{Z}} \\color{gray}{d P_{\\R{Z}'}(z)} \\cdot \\frac{d P_{\\R{Z}}(z)}{\\color{gray}{d P_{\\R{Z}'}(z)}} \\log \\frac{d P_{\\R{Z}}(z)}{d P_{\\R{Z}'}(z)}. \n", "\\end{align}\n", "$$" ] }, { "cell_type": "markdown", "id": "bbdedf6e", "metadata": {}, "source": [ "**Exercise**\n", "\n", "Show that $D_f(P_{\\R{Z}}\\|P_{\\R{Z}'})\\geq 0$ with equality iff $P_{\\R{Z}}=P_{\\R{Z}'}$ using Jensen's inequality and the properties of $f$." ] }, { "cell_type": "markdown", "id": "6ea8b79f", "metadata": { "nbgrader": { "grade": true, "grade_id": "divergence", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "source": [ "````{toggle}\n", "**Solution**\n", "\n", "It is a valid divergence because, by Jensen's inequality,\n", "\n", "$$\n", "D_f(P_{\\R{Z}}\\|P_{\\R{Z}'}) \\geq f\\bigg( \\underbrace{E\\left[ \\frac{d P_{\\R{Z}}(\\R{Z}')}{d P_{\\R{Z}'}(\\R{Z}')} \\right]}_{=1}\\bigg) = 0\n", "$$\n", "\n", "with equality iff $P_{\\R{Z}}=P_{\\R{Z}'}$.\n", "\n", "````" ] }, { "cell_type": "markdown", "id": "7490eddf", "metadata": {}, "source": [ "Regarding the divergence as an expectation, it is approximated by the sample average:\n", "\n", "$$\n", "\\begin{align}\n", "D_f(P_{\\R{Z}}\\|P_{\\R{Z}'}) &\\approx \n", "\\frac1n \\sum_{i\\in [n]} f\\left(\\frac{d P_{\\R{Z}}(\\R{Z}'_i)}{d P_{\\R{Z}'}(\\R{Z}'_i)}\\right).\n", "\\end{align}\n", "$$ (avg-f-D)" ] }, { "cell_type": "markdown", "id": "2c955350", "metadata": {}, "source": [ "However, this is not a valid estimate because it involves the unknown measures $P_{\\R{Z}}$ and $P_{\\R{Z}'}$." ] }, { "cell_type": "markdown", "id": "d4d9e5b3", "metadata": {}, "source": [ "One may further estimate the *density ratio*\n", "\n", "$$\n", "\\begin{align}\n", "z \\mapsto \\frac{d P_{\\R{Z}}(z)}{d P_{\\R{Z}'}(z)}\n", "\\end{align}\n", "$$ (dP-ratio)" ] }, { "cell_type": "markdown", "id": "5740733e", "metadata": {}, "source": [ "or estimate the density defined with respective to some reference measure $\\mu$:\n", "\n", "$$\n", "\\begin{align}\n", "p_{\\R{Z}}&:=\\frac{dP_{\\R{Z}}}{d\\mu} \\in \\mc{P}_{\\mu}(\\mc{Z}).\n", "\\end{align}\n", "$$ (density)" ] } ], "metadata": { "jupytext": { "text_representation": { "extension": ".md", "format_name": "myst", "format_version": 0.13, "jupytext_version": "1.10.3" } }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "source_map": [ 14, 18, 30, 38, 42, 46, 50, 69, 75, 85, 92, 96, 103, 109, 115, 130, 134, 142, 146, 157, 170, 191, 195, 199, 203, 222, 228, 237, 241, 251, 261, 267, 282, 293, 297, 307 ] }, "nbformat": 4, "nbformat_minor": 5 }