%matplotlib widget
from util import *Mathematical Definition¶
As shown below, a neural network is organized into layers of computation units called the neurons.
For and integer , let
- be the output of the -th layer of the neural network, and
- be the -th element of . The element is computed from the output of its previous layer except for .
The 0-th layer is called the input layer, i.e.,
The -th layer is called the output layer. All other layers are called the hidden layers.
What should be the neural network output?
The goal is to train a classifier that predicts a label for an input feature :
- Instead of a hard-decision classifier is a function such that predicts ,
- we train a probabilistic classifier that estimates , i.e.,
For the MNIST dataset, a common goal is to classify the digit type of a handwritten digit. When given a handwritten digit,
- a hard-decision classifier returns a digit type, and
- a probabilistic classifier returns a distribution of the digit types.
Why train a probabilistic classifier?
A probabilistic classifer is more general and it can give a hard decision as well
by returning the estimated most likely digit type.
- A neural network can model the distribution better than because its output is continuous.
How to ensure is a valid probability vector?
The soft-max activation function is often used for the last layer:
where is the number of distinct class labels.
It follows that:
How are the different layers related?
- is a matrix of weights;
- is a vector called bias; and
- is a reveal-valued function called the activation function.
The activation functions for other layers is often the vectorized version of
- sigmoid:
- rectified linear unit (ReLU):
The following plots the ReLU activation function.
def ReLU(z):
return np.max([np.zeros(z.shape), z], axis=0)
z = np.linspace(-5, 5, 100)
plt.figure(num=1)
plt.plot(z, ReLU(z))
plt.xlim(-5, 5)
plt.title(r"ReLU: $\max\{0,z\}$")
plt.xlabel(r"$z$")
plt.show()def sigmoid(z):
### BEGIN SOLUTION
return 1 / (1 + np.exp(-z))
### END SOLUTION
z = np.linspace(-5, 5, 100)
plt.figure(num=2)
plt.plot(z, sigmoid(z))
plt.xlim(-5, 5)
plt.title(r"Sigmoid function: $\frac{1}{1+e^{-z}}$")
plt.xlabel(r"$z$")
plt.show()# tests
### BEGIN HIDDEN TESTS
z_test = np.linspace(-5, 5, 10)
assert np.isclose(sigmoid(z_test), (lambda z: 1 / (1 + np.exp(-z)))(z_test)).all()
### END HIDDEN TESTSImplementation¶
The following uses the keras library to define the basic neural network achitecture.
keras runs on top of tensorflow and offers a higher-level abstraction to simplify the construction and training of a neural network. (tflearn is another library that provides a higher-level API for tensorflow.)
def create_simple_model():
tf.keras.backend.clear_session() # clear keras cache.
# See https://github.com/keras-team/keras/issues/7294
model = tf.keras.models.Sequential(
[
tf.keras.layers.Input(shape=(28, 28, 1)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(16, activation=tf.keras.activations.relu),
tf.keras.layers.Dense(16, activation=tf.keras.activations.relu),
tf.keras.layers.Dense(10, activation=tf.keras.activations.softmax),
],
"Simple_sequential",
)
return model
model = create_simple_model()
model.summary()The above defines a linear stack of fully-connected layers after flattening the input. The method summary is useful for debugging in Keras.
### BEGIN SOLUTION
n_hidden_layers = len(model.layers) - 2
### END SOLUTION
n_hidden_layers# tests
### BEGIN HIDDEN TESTS
assert n_hidden_layers == len(model.layers) - 2
### END HIDDEN TESTS