from util import *Logging the results¶
To call additional functions during training, we can add the functions to the callbacks parameter of the model fit method. For instance:
from tqdm.keras import TqdmCallback
if input("Train? [Y/n]").lower() != "n":
model.fit(
ds_b["train"],
epochs=6,
validation_data=ds_b["test"],
verbose=0,
callbacks=[TqdmCallback(verbose=2)],
)The above code uses TqdmCallback()` to return a callback function that displays a graphical progress bar:
- Setting
verbose=0for the methodfitdisables the default text-based progress bar. - Setting
verbose=2for the classTqdmCallbackshow and keep the progress bars for training each batch. Try changingverboseto other values to see different effects.
An important use of callback functions is to save the models and results during training for further analysis. We define the following function train_model for this purpose:
- Take a look at the docstring to learn its basic usage, and then
- learn the implementations in the source code.
import datetime
import os
import pytz
def train_model(
model,
fit_params={},
log_root=".",
save_log_params=None,
save_model_params=None
):
"""Train and test the model, and return the log directory path name.
Parameters
----------
log_root: str
Root directory for creating log directory
fit_params: dict
Dictionary of parameters to pass to model.fit.
save_log_params: dict
Dictionary of parameters to pass to
tf.keras.callbacks.TensorBoard to save the results for TensorBoard.
The default value None means no logging of the results.
save_model_params: dict
Dictionary of parameters to pass to
tf.keras.callbacks.ModelCheckpoint to save the model to checkpoint
files.
The default value None means no saving of the models.
Returns
-------
str: log directory path that points to a subfolder of log_root named
using the current time.
"""
# use a subfolder named by the current time to distinguish repeated runs
log_dir = os.path.join(
log_root,
datetime.datetime.now(tz=pytz.timezone("Asia/Hong_Kong")).strftime(
"%Y%m%d-%H%M%S"
),
)
callbacks = fit_params.pop("callbacks", []).copy()
if save_log_params is not None:
# add callback to save the training log for further analysis by tensorboard
callbacks.append(tf.keras.callbacks.TensorBoard(log_dir, **save_log_params))
if save_model_params is not None:
# save the model as checkpoint files after each training epoch
callbacks.append(
tf.keras.callbacks.ModelCheckpoint(
os.path.join(log_dir, "{epoch}.ckpt"), **save_model_params
)
)
# training + testing (validation)
model.fit(
ds_b["train"], validation_data=ds_b["test"], callbacks=callbacks, **fit_params
)
return log_dirFor example:
fit_params = {"epochs": 6, "callbacks": [TqdmCallback()], "verbose": 0}
log_root = os.path.join(user_home, "log") # log folder
save_log_params = {"update_freq": 100, "histogram_freq": 1}
save_model_params = {"save_weights_only": True, "verbose": 1}
if input("Train? [Y/n]").lower() != "n":
model = compile_model(create_simple_model())
log_dir = train_model(
model,
fit_params=fit_params,
log_root=log_root,
save_log_params=save_log_params,
save_model_params=save_model_params
)By providing the save_model_params to the callback tf.keras.callbacks.ModelCheckpoint, the model is saved at the end of each epoch to log_dir.
!ls {log_dir}Saving the model is useful because it often takes a long time to train a neural network. To reload the model from the latest checkpoint and continue to train it:
if input("Continue to train? [Y/n]").lower() != "n":
# load the weights of the previously trained model
restored_model = compile_model(create_simple_model())
restored_model.load_weights(tf.train.latest_checkpoint(log_dir))
# continue to train
train_model(restored_model, log_root=log_root, save_log_params=save_log_params)By providing tf.keras.callbacks.TensorBoard as a callback function to the fit method earlier, the training logs can be analyzed using TensorBoard.
if input('Execute? [Y/n]').lower() != 'n':
%load_ext tensorboard
%tensorboard --logdir {log_root}import tensorboard as tb
tb.notebook.list()The SCALARS tab shows the curves of training and validation losses/accuracies after different batches/epoches. The curves often have jitters as the gradient descent is stochastic (random). To see the typical performance, a smoothing factor can be applied on the left panel. The smoothed curve of the original curve is defined as
which is called the moving average. Try changing the smoothing factor on the left panel to see the effect.
Solution to Exercise 1
This leads to a large bias when using the empirical loss or performance to estimate the actual loss or performance. The performance is likely overly pessimistic since there is a large weight on the loss or performance of the previous neural network trained with fewer epochs.
We can also visualize the input images in TensorBoard:
- Run the following cell to write the images to the log directory.
- Click the
refreshbutton on the top of the previous TensorBoard panel. - Click the
IMAGEtab to show the images.
if input("Execute? [Y/n]").lower() != "n":
file_writer = tf.summary.create_file_writer(log_dir)
with file_writer.as_default():
# Don't forget to reshape.
images = np.reshape(
[image for (image, label) in ds["train"].take(25)], (-1, 28, 28, 1)
)
tf.summary.image("25 training data examples", images, max_outputs=25, step=0)In addition to presenting the results, TensorBoard is useful for debugging deep learning. In particular, learn
- to check the model graph under the
GRAPHStab, - to debug using the
DEBUGGER v2tab, and - to publish your results.
TensorBoard can also show simultaneously the logs of different runs stored in different subfolders of the log directory:
You can select different runs on the left panel to compare their performance.
Note that loading the log to TensorBoard may consume a lot of memory. You can list the TensorBoard notebook instances and kill those you do not need anymore by running !kill {pid}.
while (pid := input('pid to kill? (press enter to exit)')):
!kill {pid}Enhancements¶
def create_dropout_model():
tf.keras.backend.clear_session()
model = tf.keras.models.Sequential(
[
tf.keras.layers.Input(shape=(28, 28, 1)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation=tf.keras.activations.relu),
tf.keras.layers.Dropout(0.2), # dropout
tf.keras.layers.Dense(10, activation=tf.keras.activations.softmax),
],
name="Dropout",
)
return model
model = compile_model(create_dropout_model())
print(model.summary())
if input("Train? [Y/n]").lower() != "n":
### BEGIN SOLUTION
fit_params = {"epochs": 6, "callbacks": [TqdmCallback()], "verbose": 0}
save_log_params = {}
save_model_params = None
log_dir = train_model(
model,
fit_params=fit_params,
log_root=log_root,
save_log_params=save_log_params,
save_model_params=save_model_params
)
### END SOLUTIONdef create_cnn_model():
tf.keras.backend.clear_session()
model = tf.keras.models.Sequential(
[
tf.keras.layers.Input(shape=(28, 28, 1)),
tf.keras.layers.Conv2D(32, 3, activation="relu"),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation="relu"),
tf.keras.layers.Dense(10, activation="softmax"),
],
name="CNN",
)
return model
model = compile_model(create_cnn_model())
print(model.summary())
if input("Train? [Y/n]").lower() != "n":
### BEGIN SOLUTION
fit_params = {"epochs": 6, "callbacks": [TqdmCallback()], "verbose": 0}
save_log_params = {}
save_model_params
log_dir = train_model(
model,
fit_params=fit_params,
log_root=log_root,
save_log_params=save_log_params,
save_model_params=save_model_params
)
### END SOLUTIONCleanup¶
If you run out of storage, you should remove some of the log files:
if input('Remove all logs? [Y/n]').lower() != 'n':
!rm -rf {log_root}