Mathematical Preliminaries

import numpy

The following is a lecture series that introduces the basic theory of deep learning.

open in new tab

What to know about vector calculus?

open in new tab

Vectors are represented in lowercase boldface font as in

\begin{align} \M{x} &:= \begin{bmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}= [x_i]\in \mathbb{R}^n &\text{such as}\quad \M{x}&:= \begin{bmatrix} 1 \\ 2 \\ \vdots \\ 9 \end{bmatrix} \text{, and} \tag{column vector}\\ \M{x}^\intercal &=\begin{bmatrix}x_1 & x_2 & \cdots & x_n \end{bmatrix} & \text{such as}\quad \M{x}^\intercal &= \begin{bmatrix} 1 & 2 & \cdots & 9 \end{bmatrix}.\tag{row vector}\\ \end{align}

(1)

The above example defines a Euclidean vector, which is a 1-D array of ( $n$ ) real numbers (from $\mathbb{R}$ ) organized into a column or a row.
A column vector can be transposed ( $(\cdot)^\intercal$ ) into a row vector.

import numpy as np

seq = np.arange(1, 10)  # 1D array
x = seq.reshape(-1, 1)  # column vector
x_transposed = x.transpose()  # row vector

print("Column vector:", x, "Row vector:", x_transposed, sep="\n")

Matrices in uppercase boldface font:

\begin{align}\M{W} &:=\begin{bmatrix}w_{11} & w_{12} & \cdots & w_{1n}\\ w_{21} & \ddots & & \vdots\\ \vdots & & \ddots & \vdots \\ w_{m1} & \cdots & \cdots & w_{mn}\\ \end{bmatrix} =[w_{ij}] \in \mathbb{R}^{mn} &\text{such as} \quad \M{W} &:= \begin{bmatrix}1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}, \text{and} \\ \M{W}^\intercal &= \begin{bmatrix}w_{11} & w_{21} & \cdots & w_{m1}\\ w_{21} & \ddots & & \vdots\\ \vdots & & \ddots & \vdots \\ w_{1n} & \cdots & \cdots & w_{mn}\\ \end{bmatrix} &\text{such as} \quad \M{W} &:= \begin{bmatrix}1 & 4 & 7 \\ 2 & 5 & 8 \\ 3 & 5 & 9 \end{bmatrix}. \end{align}

(2)

The above defines a Euclidean matrix, which is a 2-D array of real numbers organized into a table with rows and columns.
Transposing a matrix turns its rows (columns) into columns (rows).

W = np.arange(1, 10).reshape(3, -1)  # 3-by-3 matrix

print("W:", W, "W^T:", W.transpose(), sep="\n")

Matrix multiplication:

\begin{align} \M{W}\M{x} = \begin{bmatrix}w_{11} & w_{12} & \cdots \\ w_{21} & \ddots & \\ \vdots & & \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \\ \vdots \end{bmatrix} = \begin{bmatrix}w_{11}x_1 + w_{12}x_2 + \cdots \\ w_{21}x_1+\cdots \\ \vdots \end{bmatrix} \end{align}

(3)

W = np.arange(1, 10).reshape(3, -1)
x = np.arange(1, 4).reshape(-1, 1)
Wx = W @ x

print("W:", W, "x:", x, "Wx:", Wx, sep="\n")

What to know about Probability Theory?

open in new tab

Joint distribution:

p_{\RM{x}\R{y}}(\M{x},y) = \underbrace{p_{\R{y}|\RM{x}}(y|\M{x})}_{ \underbrace{\Pr}_{ \text{probability measure}\kern-3em}\Set{\R{y}=y|\RM{x}=\M{x}}} \cdot \underbrace{p_{\RM{x}}(\M{x}) }_{(\underbrace{\partial_{x_1}}_{\text{partial derivative w.r.t. $x_1$}\kern-5em} \partial_{x_2}\cdots)\Pr\Set{\RM{x} \leq \M{x}}\kern-4em}\kern1em \text{where}

(4)

$p_{\R{y}|\RM{x}}(y|\M{x})$ is the probability mass function (pmf) of $\R{y}=y\in \mc{Y}$ conditioned on $\RM{x}=\M{x}\in \mc{X}$ , and
$p_{\RM{x}}(\M{x})$ is the (multivariate) probability density function (pdf) of $\RM{x}=\M{x}\in \mc{X}$ .

open in new tab

For any function $g$ of $(\RM{x},y)$ , the expectations are:

\begin{align} E[g(\RM{x},\R{y})|\RM{x}]&=\sum_{y\in \mc{Y}} g(\RM{x},y)\cdot p_{\R{y}|\RM{x}}(y|\RM{x})\tag{conditional exp.} \\ E[g(\RM{x},\R{y})] &=\int_{\mc{X}} \underbrace{\sum_{y\in \mc{Y}} g(\RM{x},y)\cdot \underbrace{p_{\RM{x},\R{y}}(\M{x},y)}_{p_{\R{y}|\RM{x}}(y|\M{x}) p_{\R{x}}(\M{x})}\kern-1.7em}_{E[g(\RM{x},\R{y})|\RM{x}]}\kern1.4em\,d \M{x} \tag{exp.}\\ &= E[E[g(\RM{x},\R{y})|\RM{x}]] \tag{iterated exp.} \end{align}

(5)