from util import *Optimization Theory¶
Why can we learn from examples?
Why?
- If were deterministic, i.e., all the time, then the classifier can simply return without even looking at .
- If were known instead, then would also be known and therefore needed not be estimated.
More precisely, the examples are called i.i.d. samples of , written as
which means that their joint distribution is .
Why?
- If all the examples were the same instead, they could not show the pattern of how depends on .
- Noise in individual examples can be smoothed out by averaging out the examples.
How to determine if a classifier is good?
Ultimately, we desire a classifier with the maximum accuracy in predicting but doing so is computationally too difficult.
Instead, we regard a classification algorithm to be reasonably good if
- it can achieve the maximum possible accuracy
- as the number of training samples goes to ∞.
A consistent probabilistic classifier gives rise to an asymptotically optimal hard-decision classifier that achieves the maximum accuracy.
How can we obtain a consistent classifier?
We train a neural network to minimize certain loss function.
A common loss function for classification uses the cross entropy measure in information theory.
The identity can be proved quite easily using the linearity of expectation
and a property of logarithm that for all .
Hence, a neural network that minimizes the cross entropy equals a.s. for all and any possible input image .
Solution to Exercise 1
Proof: Applying the positivity of divergence to the information identity, we have
with equality if and only if . Hence, the cross entropy is minimized to the conditional entropy by having a.s.
The cross entropy cannot be computed exactly without knowing the joint distribution . Nevertheless, it can be estimated from a batch of i.i.d. samples for :
where
is the vector of parameters of the neural network defined in (net).
A mini-batch gradient descent algorithm is often used to reduce the loss. It iteratively updates/trains the neural network parameters:
by computing the gradient on a randomly selected minibatch of examples and choosing an appropriate learning rate .
What is gradient descent?
How to choose the step size?
- The gradient can be computed systematically using a technique called backpropagation due to the structure of the neural network in (net).
- The learning rate can affect the convergence rate of the loss to a local minima:
- θ may overshoot its optimal value if is too large, and
- the convergence can be very slow if θ is too small.
A more advanced method called Adam (Adaptive Momentum Estimation) can adaptively choose to speed up the convergence.
Training¶
The loss function, gradient descent algorithm, and the performance metrics can be specified using the compile method.
def compile_model(model):
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
optimizer=tf.keras.optimizers.Adam(0.001),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
return model
compile_model(model)
model.loss, model.optimizerWe can train the neural network using the method fit of the compiled model:
if input('Train? [Y/n]').lower() != 'n':
model.fit(ds_b["train"])Solution to Exercise 2
The accuracy increases at a diminishing rate as we rerun the training. This is because the gradient descent algorithm iteratively reduces the loss.
We can set the parameter epochs to train the neural network for multiple epochs since it is quite unlikely to train a neural network well with just one epoch.
To determine whether the neural network is well-trained (when to stop training), we should also use a separate validation set to evaluate the performance of the neural network. The validation set can be specified using the parameter validation_set as follows:
if input('Train? [Y/n]').lower() != 'n':
model.fit(ds_b["train"], epochs=6, validation_data=ds_b["test"])Solution to Exercise 3
It is biased since the selection of the model depends on the validation accuracy, and therefore, the validation set. To avoid such bias, we should use a separate test set to evaluate the performance of the well-trained neural network at the end.
Deployment¶
Once you are satisfied with the result, you can deploy the model as a web application.
The mnist folder contain the webpage index.html that
- presents an HTML5 canvas for users to input a handwritten digit,
- loads a trained model using
tensorflow.js, - passes the handwritten digit to the model to predict the distribution of the digit types, and
- display the most likely digit type.
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
<div style="text-align: center;">
<canvas id="sketchpad" style="border-style:solid;"></canvas>
<br>
<button onclick="sketchpad.undo()">undo</button>
<button onclick="sketchpad.redo()">redo</button>
<button onclick="sketchpad.clear()">clear</button>
<button onclick="predict()">predict</button><br>
<input oninput="sketchpad.penSize=self.val()" id="size-picker" type="range" min="1" max="50">
<br>
<div style="display: flex; justify-content: center;">
<canvas id="input" width="28" height="28"></canvas>
<p id="result"></p>
</div>
</div>
<script src="https://cdn.jsdelivr.net/npm/jquery@1.11.1/dist/jquery.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/sketchpad@0.1.0/scripts/sketchpad.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script>
<script>
const context = document.querySelector("#input").getContext('2d');
const sketchpad = new Sketchpad({
element: '#sketchpad',
width: 280,
height: 280
});
sketchpad.penSize = 25;
$('#size-picker').val(sketchpad.penSize);
$('#size-picker').change(function (event) {sketchpad.penSize = $(event.target).val()});
let model;
addEventListener('DOMContentLoaded', (async function () {
model = await tf.loadLayersModel('model/model.json');
}));
function predict() {
var img = new Image();
img.onload = async function() {
context.clearRect(0, 0, 28, 28);
context.drawImage(img, 0, 0, 28, 28);
data = context.getImageData(0, 0, 28, 28).data;
var input = [];
for(var i = 0; i < data.length; i += 4) {
input.push(data[i + 3] / 255);
}
scores = await model.predict(tf.tensor(input).reshape([1, 28, 28, 1])).array();
scores = scores[0];
$('#result').text('is classified as ' + scores.indexOf(Math.max(...scores)) + '.');
};
img.src = sketchpad.canvas.toDataURL('image/png');
}
</script>
</body>
</html>Then, convert the model to files that can be loaded by tensorflow.js:
import tensorflowjs as tfjs
tfjs.converters.save_keras_model(model, "mnist/model")To host the web application, run the following command:
if input('Execute? [Y/n]').lower() != 'n':
!mkdir -p ~/www/ && cp -r mnist ~/www/View your web app here:
display.IFrame(src=JUPYTER_SERVICE_PREFIX + 'www/mnist/', width=500, height=400)