{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "82b4ca39",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Problem Formulation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21fce64f",
   "metadata": {},
   "source": [
    "$\\def\\abs#1{\\left\\lvert #1 \\right\\rvert}\n",
    "\\def\\Set#1{\\left\\{ #1 \\right\\}}\n",
    "\\def\\mc#1{\\mathcal{#1}}\n",
    "\\def\\M#1{\\boldsymbol{#1}}\n",
    "\\def\\R#1{\\mathsf{#1}}\n",
    "\\def\\RM#1{\\boldsymbol{\\mathsf{#1}}}\n",
    "\\def\\op#1{\\operatorname{#1}}\n",
    "\\def\\E{\\op{E}}\n",
    "\\def\\d{\\mathrm{\\mathstrut d}}$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3a8f07c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "\n",
    "%matplotlib inline\n",
    "SEED = 0"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4bd8b99",
   "metadata": {},
   "source": [
    "## Mutual information estimation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5515c02a",
   "metadata": {},
   "source": [
    "**How to formulate the problem of mutual information estimation?**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f524313a",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "The problem of estimating the mutual information is:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "edf98aa6",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "````{prf:definition} MI Estimation  \n",
    ":label: mi-estimation\n",
    "\n",
    "Given $n$ samples\n",
    "\n",
    "$$(\\R{X}_1,\\R{Y}_1),\\dots, (\\R{X}_n,\\R{Y}_n) \\stackrel{iid}{\\sim} P_{\\R{X},\\R{Y}}\\in \\mc{P}(\\mc{X},\\mc{Y})$$ \n",
    "\n",
    "i.i.d. drawn from an *unknown* probability measure $P_{\\R{X},\\R{Y}}$ from the space $\\mc{X}\\times \\mc{Y}$, estimate the *mutual information (MI)*\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "I(\\R{X}\\wedge\\R{Y}) &:= E\\left[\\log \\frac{d P_{\\R{X},\\R{Y}}(\\R{X},\\R{Y})}{d (P_{\\R{X}} \\times P_{\\R{Y}})(\\R{X},\\R{Y})} \\right].\n",
    "\\end{align}\n",
    "$$ (MI)\n",
    "\n",
    "````"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32b9e401",
   "metadata": {},
   "source": [
    "Run the following code, which uses `numpy` to \n",
    "- generate i.i.d. samples from a multivariate gaussian distribution, and\n",
    "- store the samples as numpy arrays assigned to `XY`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "08d38624",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Seeded random number generator for reproducibility\n",
    "XY_rng = np.random.default_rng(SEED)\n",
    "\n",
    "# Sampling from an unknown probability measure\n",
    "rho = 1 - 0.19 * XY_rng.random()\n",
    "mean, cov, n = [0, 0], [[1, rho], [rho, 1]], 1000\n",
    "XY = XY_rng.multivariate_normal(mean, cov, n)\n",
    "plt.scatter(XY[:, 0], XY[:, 1], s=2)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5898ec3b",
   "metadata": {},
   "source": [
    "See [`multivariate_normal`][mn] and [`scatter`][sc].\n",
    "\n",
    "[mn]: https://numpy.org/doc/stable/reference/random/generated/numpy.random.multivariate_normal.html\n",
    "[sc]: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bdd96ce5",
   "metadata": {},
   "source": [
    "You can also get help directly in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html):"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb6eebde",
   "metadata": {},
   "source": [
    "- **Docstring**: \n",
    "  - Move the cursor to the object and \n",
    "    - click `Help->Show Contextual Help` or\n",
    "    - click `Shift-Tab` if you have limited screen space."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d1ac0fc",
   "metadata": {},
   "source": [
    "- **Directory**:\n",
    "  - Right click on a notebook and choose `New Console for Notebook`. \n",
    "  - Run `dir(obj)` for a previously defined object `obj` to see the available methods/properties of `obj`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a574ce2",
   "metadata": {},
   "source": [
    "**Exercise** \n",
    "\n",
    "What is unknown about the above sampling distribution?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8002062",
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "unknown-distribution",
     "locked": false,
     "points": 1,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "tags": [
     "hide-cell"
    ]
   },
   "source": [
    "````{toggle}\n",
    "**Solution**\n",
    "\n",
    "The density is\n",
    "\n",
    "$$\n",
    "\\frac{d P_{\\R{X},\\R{Y}}}{dxdy} = \\mc{N}_{\\M{0},\\left[\\begin{smallmatrix}1 & \\rho \\\\ \\rho & 1\\end{smallmatrix}\\right]}(x,y)\n",
    "$$\n",
    "\n",
    "but $\\rho$ is unknown (uniformly random over $[0.8,0.99)$).\n",
    "\n",
    "````"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea8cd1cd",
   "metadata": {},
   "source": [
    "To show the data samples using `pandas`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12bd6f11",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "XY_df = pd.DataFrame(XY, columns=[\"X\", \"Y\"])\n",
    "XY_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a497538",
   "metadata": {},
   "source": [
    "To plot the data using `seaborn`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34a04ebf",
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_samples_with_kde(df, **kwargs):\n",
    "    p = sns.PairGrid(df, **kwargs)\n",
    "    p.map_lower(sns.scatterplot, s=2)  # scatter plot of samples\n",
    "    p.map_upper(sns.kdeplot)  # kernel density estimate for pXY\n",
    "    p.map_diag(sns.kdeplot)  # kde for pX and pY\n",
    "    return p\n",
    "\n",
    "\n",
    "plot_samples_with_kde(XY_df)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fc79481",
   "metadata": {},
   "source": [
    "**Exercise** \n",
    "\n",
    "Complete the following code by replacing the blanks `___` so that `XY_ref` stores the i.i.d. samples of $(\\R{X}',\\R{Y}')$ where $\\R{X}'$ and $\\R{Y}'$ are zero-mean independent gaussian random variables with unit variance.\n",
    "\n",
    "```Python\n",
    "...\n",
    "cov_ref, n_ = ___, n\n",
    "XY_ref = XY_ref_rng.___(mean, ___, n_)\n",
    "...\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09d75ce7",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "sampling",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "tags": [
     "hide-cell"
    ]
   },
   "outputs": [],
   "source": [
    "XY_ref_rng = np.random.default_rng(SEED)\n",
    "### BEGIN SOLUTION\n",
    "cov_ref, n_ = [[1, 0], [0, 1]], n\n",
    "XY_ref = XY_ref_rng.multivariate_normal(mean, cov_ref, n_)\n",
    "### END SOLUTION\n",
    "XY_ref_df = pd.DataFrame(XY_ref, columns=[\"X'\", \"Y'\"])\n",
    "plot_samples_with_kde(XY_ref_df)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ebf74bf6",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Divergence estimation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe24a6fb",
   "metadata": {},
   "source": [
    "**Can we generalize the problem further?**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ccc129bd",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "Estimating MI may be viewed as a special case of the following problem:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef60218e",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "````{prf:definition} Divergence estimation  \n",
    ":label: D-estimation\n",
    "\n",
    "Estimate the KL *divergence*\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "D(P_{\\R{Z}}\\|P_{\\R{Z}'}) &:= E\\left[\\log \\frac{d P_{\\R{Z}}(\\R{Z})}{d P_{\\R{Z}'}(\\R{Z})} \\right]\n",
    "\\end{align}\n",
    "$$ (D)\n",
    "\n",
    "using \n",
    "- a sequence $\\R{Z}^n:=(\\R{Z}_1,\\dots, \\R{Z}_n)\\sim P_{\\R{Z}}^n$ of i.i.d. samples from $P_{\\R{Z}}$ if $P_{\\R{Z}}$ is unknown, and\n",
    "- another sequence ${\\R{Z}'}^{n'}\\sim P_{\\R{Z}'}^{n'}$ of i.i.d. samples from $P_{\\R{Z}'}$  if $P_{\\R{Z}'}$, the *reference measure* of $P_{\\R{Z}}$, is also unknown.\n",
    "\n",
    "````"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33580309",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "source": [
    "**Exercise** \n",
    "\n",
    "Although $\\R{X}^n$ and $\\R{Y}^n$ for MI estimation should have the same length, $\\R{Z}^n$ and ${\\R{Z}'}^{n'}$ can have different lengths, i.e., $n \\not\\equiv n'$. Why?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7de3f91c",
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "Z-Z_ref",
     "locked": false,
     "points": 1,
     "schema_version": 3,
     "solution": true,
     "task": false
    },
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": [
     "hide-cell"
    ]
   },
   "source": [
    "````{toggle}\n",
    "**Solution** \n",
    "\n",
    "The dependency between $\\R{Z}$ and $\\R{Z}'$ does not affect the divergence.\n",
    "\n",
    "````"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c544e10",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "Regarding the mutual information as a divergence from joint to product distributions, the problem can be further generalized to estimtate other divergences such as the $f$-divergence:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77c2a9a5",
   "metadata": {},
   "source": [
    "For a strictly convex function $f$ with $f(1)=0$,\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "D_f(P_{\\R{Z}}\\|P_{\\R{Z}'}) &:= E\\left[ f\\left(\\frac{d P_{\\R{Z}}(\\R{Z}')}{d P_{\\R{Z}'}(\\R{Z}')}\\right) \\right].\n",
    "\\end{align}\n",
    "$$ (f-D)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b4f5fef",
   "metadata": {},
   "source": [
    "$f$-divergence in {eq}`f-D` reduces to KL divergence when $f=u \\log u$:\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "E\\left[ \\frac{d P_{\\R{Z}}(\\R{Z}')}{d P_{\\R{Z}'}(\\R{Z}')} \\log \\frac{d P_{\\R{Z}}(\\R{Z}')}{d P_{\\R{Z}'}(\\R{Z}')}  \\right] &= \\int_{\\mc{Z}} \\color{gray}{d P_{\\R{Z}'}(z)} \\cdot \\frac{d P_{\\R{Z}}(z)}{\\color{gray}{d P_{\\R{Z}'}(z)}} \\log \\frac{d P_{\\R{Z}}(z)}{d P_{\\R{Z}'}(z)}. \n",
    "\\end{align}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bbdedf6e",
   "metadata": {},
   "source": [
    "**Exercise**\n",
    "\n",
    "Show that $D_f(P_{\\R{Z}}\\|P_{\\R{Z}'})\\geq 0$ with equality iff $P_{\\R{Z}}=P_{\\R{Z}'}$ using Jensen's inequality and the properties of $f$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ea8b79f",
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "divergence",
     "locked": false,
     "points": 1,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "source": [
    "````{toggle}\n",
    "**Solution**\n",
    "\n",
    "It is a valid divergence because, by Jensen's inequality,\n",
    "\n",
    "$$\n",
    "D_f(P_{\\R{Z}}\\|P_{\\R{Z}'}) \\geq  f\\bigg( \\underbrace{E\\left[ \\frac{d P_{\\R{Z}}(\\R{Z}')}{d P_{\\R{Z}'}(\\R{Z}')} \\right]}_{=1}\\bigg) = 0\n",
    "$$\n",
    "\n",
    "with equality iff $P_{\\R{Z}}=P_{\\R{Z}'}$.\n",
    "\n",
    "````"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7490eddf",
   "metadata": {},
   "source": [
    "Regarding the divergence as an expectation, it is approximated by the sample average:\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "D_f(P_{\\R{Z}}\\|P_{\\R{Z}'}) &\\approx \n",
    "\\frac1n \\sum_{i\\in [n]} f\\left(\\frac{d P_{\\R{Z}}(\\R{Z}'_i)}{d P_{\\R{Z}'}(\\R{Z}'_i)}\\right).\n",
    "\\end{align}\n",
    "$$ (avg-f-D)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c955350",
   "metadata": {},
   "source": [
    "However, this is not a valid estimate because it involves the unknown measures $P_{\\R{Z}}$ and $P_{\\R{Z}'}$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4d9e5b3",
   "metadata": {},
   "source": [
    "One may further estimate the *density ratio*\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "z \\mapsto \\frac{d P_{\\R{Z}}(z)}{d P_{\\R{Z}'}(z)}\n",
    "\\end{align}\n",
    "$$ (dP-ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5740733e",
   "metadata": {},
   "source": [
    "or estimate the density defined with respective to some reference measure $\\mu$:\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "p_{\\R{Z}}&:=\\frac{dP_{\\R{Z}}}{d\\mu} \\in \\mc{P}_{\\mu}(\\mc{Z}).\n",
    "\\end{align}\n",
    "$$ (density)"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "text_representation": {
    "extension": ".md",
    "format_name": "myst",
    "format_version": 0.13,
    "jupytext_version": "1.10.3"
   }
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "source_map": [
   14,
   18,
   30,
   38,
   42,
   46,
   50,
   69,
   75,
   85,
   92,
   96,
   103,
   109,
   115,
   130,
   134,
   142,
   146,
   157,
   170,
   191,
   195,
   199,
   203,
   222,
   228,
   237,
   241,
   251,
   261,
   267,
   282,
   293,
   297,
   307
  ]
 },
 "nbformat": 4,
 "nbformat_minor": 5
}