Skip to article frontmatterSkip to article content

Mathematical Preliminaries

City University of Hong Kong
import numpy

The following is a lecture series that introduces the basic theory of deep learning.

What to know about vector calculus?

Vectors are represented in lowercase boldface font as in

x:=[x1x2xn]=[xi]Rnsuch asx:=[129], andx=[x1x2xn]such asx=[129].\begin{align} \M{x} &:= \begin{bmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}= [x_i]\in \mathbb{R}^n &\text{such as}\quad \M{x}&:= \begin{bmatrix} 1 \\ 2 \\ \vdots \\ 9 \end{bmatrix} \text{, and} \tag{column vector}\\ \M{x}^\intercal &=\begin{bmatrix}x_1 & x_2 & \cdots & x_n \end{bmatrix} & \text{such as}\quad \M{x}^\intercal &= \begin{bmatrix} 1 & 2 & \cdots & 9 \end{bmatrix}.\tag{row vector}\\ \end{align}
  • The above example defines a Euclidean vector, which is a 1-D array of (nn) real numbers (from R\mathbb{R}) organized into a column or a row.
  • A column vector can be transposed (()(\cdot)^\intercal) into a row vector.
import numpy as np

seq = np.arange(1, 10)  # 1D array
x = seq.reshape(-1, 1)  # column vector
x_transposed = x.transpose()  # row vector

print("Column vector:", x, "Row vector:", x_transposed, sep="\n")

Matrices in uppercase boldface font:

W:=[w11w12w1nw21wm1wmn]=[wij]Rmnsuch asW:=[123456789],andW=[w11w21wm1w21w1nwmn]such asW:=[147258359].\begin{align}\M{W} &:=\begin{bmatrix}w_{11} & w_{12} & \cdots & w_{1n}\\ w_{21} & \ddots & & \vdots\\ \vdots & & \ddots & \vdots \\ w_{m1} & \cdots & \cdots & w_{mn}\\ \end{bmatrix} =[w_{ij}] \in \mathbb{R}^{mn} &\text{such as} \quad \M{W} &:= \begin{bmatrix}1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}, \text{and} \\ \M{W}^\intercal &= \begin{bmatrix}w_{11} & w_{21} & \cdots & w_{m1}\\ w_{21} & \ddots & & \vdots\\ \vdots & & \ddots & \vdots \\ w_{1n} & \cdots & \cdots & w_{mn}\\ \end{bmatrix} &\text{such as} \quad \M{W} &:= \begin{bmatrix}1 & 4 & 7 \\ 2 & 5 & 8 \\ 3 & 5 & 9 \end{bmatrix}. \end{align}
  • The above defines a Euclidean matrix, which is a 2-D array of real numbers organized into a table with rows and columns.
  • Transposing a matrix turns its rows (columns) into columns (rows).
W = np.arange(1, 10).reshape(3, -1)  # 3-by-3 matrix

print("W:", W, "W^T:", W.transpose(), sep="\n")
Wx=[w11w12w21][x1x2]=[w11x1+w12x2+w21x1+]\begin{align} \M{W}\M{x} = \begin{bmatrix}w_{11} & w_{12} & \cdots \\ w_{21} & \ddots & \\ \vdots & & \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \\ \vdots \end{bmatrix} = \begin{bmatrix}w_{11}x_1 + w_{12}x_2 + \cdots \\ w_{21}x_1+\cdots \\ \vdots \end{bmatrix} \end{align}
W = np.arange(1, 10).reshape(3, -1)
x = np.arange(1, 4).reshape(-1, 1)
Wx = W @ x

print("W:", W, "x:", x, "Wx:", Wx, sep="\n")

What to know about Probability Theory?

Joint distribution:

pxy(x,y)=pyx(yx)Prprobability measure{y=yx=x}px(x)(x1partial derivative w.r.t. x1x2)Pr{xx}wherep_{\RM{x}\R{y}}(\M{x},y) = \underbrace{p_{\R{y}|\RM{x}}(y|\M{x})}_{ \underbrace{\Pr}_{ \text{probability measure}\kern-3em}\Set{\R{y}=y|\RM{x}=\M{x}}} \cdot \underbrace{p_{\RM{x}}(\M{x}) }_{(\underbrace{\partial_{x_1}}_{\text{partial derivative w.r.t. $x_1$}\kern-5em} \partial_{x_2}\cdots)\Pr\Set{\RM{x} \leq \M{x}}\kern-4em}\kern1em \text{where}
  • pyx(yx)p_{\R{y}|\RM{x}}(y|\M{x}) is the probability mass function (pmf) of y=yY\R{y}=y\in \mc{Y} conditioned on x=xX\RM{x}=\M{x}\in \mc{X}, and
  • px(x)p_{\RM{x}}(\M{x}) is the (multivariate) probability density function (pdf) of x=xX\RM{x}=\M{x}\in \mc{X}.

For any function gg of (x,y)(\RM{x},y), the expectations are:

E[g(x,y)x]=yYg(x,y)pyx(yx)E[g(x,y)]=XyYg(x,y)px,y(x,y)pyx(yx)px(x)E[g(x,y)x]dx=E[E[g(x,y)x]]\begin{align} E[g(\RM{x},\R{y})|\RM{x}]&=\sum_{y\in \mc{Y}} g(\RM{x},y)\cdot p_{\R{y}|\RM{x}}(y|\RM{x})\tag{conditional exp.} \\ E[g(\RM{x},\R{y})] &=\int_{\mc{X}} \underbrace{\sum_{y\in \mc{Y}} g(\RM{x},y)\cdot \underbrace{p_{\RM{x},\R{y}}(\M{x},y)}_{p_{\R{y}|\RM{x}}(y|\M{x}) p_{\R{x}}(\M{x})}\kern-1.7em}_{E[g(\RM{x},\R{y})|\RM{x}]}\kern1.4em\,d \M{x} \tag{exp.}\\ &= E[E[g(\RM{x},\R{y})|\RM{x}]] \tag{iterated exp.} \end{align}