{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**Chapter 11 – Training Deep Neural Networks**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_This notebook contains all the sample code and solutions to the exercises in chapter 11._" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", "
\n", " Run in Google Colab\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20 and TensorFlow ≥2.0." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Python ≥3.5 is required\n", "import sys\n", "assert sys.version_info >= (3, 5)\n", "\n", "# Scikit-Learn ≥0.20 is required\n", "import sklearn\n", "assert sklearn.__version__ >= \"0.20\"\n", "\n", "try:\n", " # %tensorflow_version only exists in Colab.\n", " %tensorflow_version 2.x\n", "except Exception:\n", " pass\n", "\n", "# TensorFlow ≥2.0 is required\n", "import tensorflow as tf\n", "from tensorflow import keras\n", "assert tf.__version__ >= \"2.0\"\n", "\n", "# Common imports\n", "import numpy as np\n", "import os\n", "\n", "# to make this notebook's output stable across runs\n", "np.random.seed(42)\n", "\n", "# To plot pretty figures\n", "%matplotlib inline\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "mpl.rc('axes', labelsize=14)\n", "mpl.rc('xtick', labelsize=12)\n", "mpl.rc('ytick', labelsize=12)\n", "\n", "# Where to save the figures\n", "PROJECT_ROOT_DIR = \".\"\n", "CHAPTER_ID = \"deep\"\n", "IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, \"images\", CHAPTER_ID)\n", "os.makedirs(IMAGES_PATH, exist_ok=True)\n", "\n", "def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n", " path = os.path.join(IMAGES_PATH, fig_id + \".\" + fig_extension)\n", " print(\"Saving figure\", fig_id)\n", " if tight_layout:\n", " plt.tight_layout()\n", " plt.savefig(path, format=fig_extension, dpi=resolution)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Vanishing/Exploding Gradients Problem" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def logit(z):\n", " return 1 / (1 + np.exp(-z))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saving figure sigmoid_saturation_plot\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "z = np.linspace(-5, 5, 200)\n", "\n", "plt.plot([-5, 5], [0, 0], 'k-')\n", "plt.plot([-5, 5], [1, 1], 'k--')\n", "plt.plot([0, 0], [-0.2, 1.2], 'k-')\n", "plt.plot([-5, 5], [-3/4, 7/4], 'g--')\n", "plt.plot(z, logit(z), \"b-\", linewidth=2)\n", "props = dict(facecolor='black', shrink=0.1)\n", "plt.annotate('Saturating', xytext=(3.5, 0.7), xy=(5, 1), arrowprops=props, fontsize=14, ha=\"center\")\n", "plt.annotate('Saturating', xytext=(-3.5, 0.3), xy=(-5, 0), arrowprops=props, fontsize=14, ha=\"center\")\n", "plt.annotate('Linear', xytext=(2, 0.2), xy=(0, 0.5), arrowprops=props, fontsize=14, ha=\"center\")\n", "plt.grid(True)\n", "plt.title(\"Sigmoid activation function\", fontsize=14)\n", "plt.axis([-5, 5, -0.2, 1.2])\n", "\n", "save_fig(\"sigmoid_saturation_plot\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Xavier and He Initialization" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Constant',\n", " 'GlorotNormal',\n", " 'GlorotUniform',\n", " 'Identity',\n", " 'Initializer',\n", " 'Ones',\n", " 'Orthogonal',\n", " 'RandomNormal',\n", " 'RandomUniform',\n", " 'TruncatedNormal',\n", " 'VarianceScaling',\n", " 'Zeros',\n", " 'constant',\n", " 'deserialize',\n", " 'get',\n", " 'glorot_normal',\n", " 'glorot_uniform',\n", " 'he_normal',\n", " 'he_uniform',\n", " 'identity',\n", " 'lecun_normal',\n", " 'lecun_uniform',\n", " 'ones',\n", " 'orthogonal',\n", " 'serialize',\n", " 'zeros']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[name for name in dir(keras.initializers) if not name.startswith(\"_\")]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "keras.layers.Dense(10, activation=\"relu\", kernel_initializer=\"he_normal\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "init = keras.initializers.VarianceScaling(scale=2., mode='fan_avg',\n", " distribution='uniform')\n", "keras.layers.Dense(10, activation=\"relu\", kernel_initializer=init)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nonsaturating Activation Functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Leaky ReLU" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def leaky_relu(z, alpha=0.01):\n", " return np.maximum(alpha*z, z)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saving figure leaky_relu_plot\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(z, leaky_relu(z, 0.05), \"b-\", linewidth=2)\n", "plt.plot([-5, 5], [0, 0], 'k-')\n", "plt.plot([0, 0], [-0.5, 4.2], 'k-')\n", "plt.grid(True)\n", "props = dict(facecolor='black', shrink=0.1)\n", "plt.annotate('Leak', xytext=(-3.5, 0.5), xy=(-5, -0.2), arrowprops=props, fontsize=14, ha=\"center\")\n", "plt.title(\"Leaky ReLU activation function\", fontsize=14)\n", "plt.axis([-5, 5, -0.5, 4.2])\n", "\n", "save_fig(\"leaky_relu_plot\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['deserialize',\n", " 'elu',\n", " 'exponential',\n", " 'get',\n", " 'hard_sigmoid',\n", " 'linear',\n", " 'relu',\n", " 'selu',\n", " 'serialize',\n", " 'sigmoid',\n", " 'softmax',\n", " 'softplus',\n", " 'softsign',\n", " 'tanh']" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[m for m in dir(keras.activations) if not m.startswith(\"_\")]" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['LeakyReLU', 'PReLU', 'ReLU', 'ThresholdedReLU']" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[m for m in dir(keras.layers) if \"relu\" in m.lower()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's train a neural network on Fashion MNIST using the Leaky ReLU:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()\n", "X_train_full = X_train_full / 255.0\n", "X_test = X_test / 255.0\n", "X_valid, X_train = X_train_full[:5000], X_train_full[5000:]\n", "y_valid, y_train = y_train_full[:5000], y_train_full[5000:]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(300, kernel_initializer=\"he_normal\"),\n", " keras.layers.LeakyReLU(),\n", " keras.layers.Dense(100, kernel_initializer=\"he_normal\"),\n", " keras.layers.LeakyReLU(),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "model.compile(loss=\"sparse_categorical_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=1e-3),\n", " metrics=[\"accuracy\"])" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/10\n", "55000/55000 [==============================] - 3s 50us/sample - loss: 1.2806 - accuracy: 0.6250 - val_loss: 0.8883 - val_accuracy: 0.7152\n", "Epoch 2/10\n", "55000/55000 [==============================] - 2s 40us/sample - loss: 0.7954 - accuracy: 0.7373 - val_loss: 0.7135 - val_accuracy: 0.7648\n", "Epoch 3/10\n", "55000/55000 [==============================] - 2s 42us/sample - loss: 0.6816 - accuracy: 0.7727 - val_loss: 0.6356 - val_accuracy: 0.7882\n", "Epoch 4/10\n", "55000/55000 [==============================] - 2s 42us/sample - loss: 0.6215 - accuracy: 0.7935 - val_loss: 0.5922 - val_accuracy: 0.8012\n", "Epoch 5/10\n", "55000/55000 [==============================] - 2s 42us/sample - loss: 0.5830 - accuracy: 0.8081 - val_loss: 0.5596 - val_accuracy: 0.8172\n", "Epoch 6/10\n", "55000/55000 [==============================] - 2s 42us/sample - loss: 0.5553 - accuracy: 0.8155 - val_loss: 0.5338 - val_accuracy: 0.8240\n", "Epoch 7/10\n", "55000/55000 [==============================] - 2s 40us/sample - loss: 0.5340 - accuracy: 0.8221 - val_loss: 0.5157 - val_accuracy: 0.8310\n", "Epoch 8/10\n", "55000/55000 [==============================] - 2s 41us/sample - loss: 0.5172 - accuracy: 0.8265 - val_loss: 0.5035 - val_accuracy: 0.8336\n", "Epoch 9/10\n", "55000/55000 [==============================] - 2s 42us/sample - loss: 0.5036 - accuracy: 0.8299 - val_loss: 0.4950 - val_accuracy: 0.8354\n", "Epoch 10/10\n", "55000/55000 [==============================] - 2s 42us/sample - loss: 0.4922 - accuracy: 0.8324 - val_loss: 0.4797 - val_accuracy: 0.8430\n" ] } ], "source": [ "history = model.fit(X_train, y_train, epochs=10,\n", " validation_data=(X_valid, y_valid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's try PReLU:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(300, kernel_initializer=\"he_normal\"),\n", " keras.layers.PReLU(),\n", " keras.layers.Dense(100, kernel_initializer=\"he_normal\"),\n", " keras.layers.PReLU(),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "model.compile(loss=\"sparse_categorical_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=1e-3),\n", " metrics=[\"accuracy\"])" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/10\n", "55000/55000 [==============================] - 3s 61us/sample - loss: 1.3460 - accuracy: 0.6233 - val_loss: 0.9251 - val_accuracy: 0.7208\n", "Epoch 2/10\n", "55000/55000 [==============================] - 3s 56us/sample - loss: 0.8208 - accuracy: 0.7359 - val_loss: 0.7318 - val_accuracy: 0.7626\n", "Epoch 3/10\n", "55000/55000 [==============================] - 3s 55us/sample - loss: 0.6974 - accuracy: 0.7695 - val_loss: 0.6500 - val_accuracy: 0.7886\n", "Epoch 4/10\n", "55000/55000 [==============================] - 3s 55us/sample - loss: 0.6338 - accuracy: 0.7904 - val_loss: 0.6000 - val_accuracy: 0.8070\n", "Epoch 5/10\n", "55000/55000 [==============================] - 3s 57us/sample - loss: 0.5920 - accuracy: 0.8045 - val_loss: 0.5662 - val_accuracy: 0.8172\n", "Epoch 6/10\n", "55000/55000 [==============================] - 3s 55us/sample - loss: 0.5620 - accuracy: 0.8138 - val_loss: 0.5416 - val_accuracy: 0.8230\n", "Epoch 7/10\n", "55000/55000 [==============================] - 3s 55us/sample - loss: 0.5393 - accuracy: 0.8203 - val_loss: 0.5218 - val_accuracy: 0.8302\n", "Epoch 8/10\n", "55000/55000 [==============================] - 3s 57us/sample - loss: 0.5216 - accuracy: 0.8248 - val_loss: 0.5051 - val_accuracy: 0.8340\n", "Epoch 9/10\n", "55000/55000 [==============================] - 3s 59us/sample - loss: 0.5069 - accuracy: 0.8289 - val_loss: 0.4923 - val_accuracy: 0.8384\n", "Epoch 10/10\n", "55000/55000 [==============================] - 3s 62us/sample - loss: 0.4948 - accuracy: 0.8322 - val_loss: 0.4847 - val_accuracy: 0.8372\n" ] } ], "source": [ "history = model.fit(X_train, y_train, epochs=10,\n", " validation_data=(X_valid, y_valid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ELU" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "def elu(z, alpha=1):\n", " return np.where(z < 0, alpha * (np.exp(z) - 1), z)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saving figure elu_plot\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(z, elu(z), \"b-\", linewidth=2)\n", "plt.plot([-5, 5], [0, 0], 'k-')\n", "plt.plot([-5, 5], [-1, -1], 'k--')\n", "plt.plot([0, 0], [-2.2, 3.2], 'k-')\n", "plt.grid(True)\n", "plt.title(r\"ELU activation function ($\\alpha=1$)\", fontsize=14)\n", "plt.axis([-5, 5, -2.2, 3.2])\n", "\n", "save_fig(\"elu_plot\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Implementing ELU in TensorFlow is trivial, just specify the activation function when building each layer:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "keras.layers.Dense(10, activation=\"elu\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### SELU" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This activation function was proposed in this [great paper](https://arxiv.org/pdf/1706.02515.pdf) by Günter Klambauer, Thomas Unterthiner and Andreas Mayr, published in June 2017. During training, a neural network composed exclusively of a stack of dense layers using the SELU activation function and LeCun initialization will self-normalize: the output of each layer will tend to preserve the same mean and variance during training, which solves the vanishing/exploding gradients problem. As a result, this activation function outperforms the other activation functions very significantly for such neural nets, so you should really try it out. Unfortunately, the self-normalizing property of the SELU activation function is easily broken: you cannot use ℓ1 or ℓ2 regularization, regular dropout, max-norm, skip connections or other non-sequential topologies (so recurrent neural networks won't self-normalize). However, in practice it works quite well with sequential CNNs. If you break self-normalization, SELU will not necessarily outperform other activation functions." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "from scipy.special import erfc\n", "\n", "# alpha and scale to self normalize with mean 0 and standard deviation 1\n", "# (see equation 14 in the paper):\n", "alpha_0_1 = -np.sqrt(2 / np.pi) / (erfc(1/np.sqrt(2)) * np.exp(1/2) - 1)\n", "scale_0_1 = (1 - erfc(1 / np.sqrt(2)) * np.sqrt(np.e)) * np.sqrt(2 * np.pi) * (2 * erfc(np.sqrt(2))*np.e**2 + np.pi*erfc(1/np.sqrt(2))**2*np.e - 2*(2+np.pi)*erfc(1/np.sqrt(2))*np.sqrt(np.e)+np.pi+2)**(-1/2)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def selu(z, scale=scale_0_1, alpha=alpha_0_1):\n", " return scale * elu(z, alpha)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saving figure selu_plot\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(z, selu(z), \"b-\", linewidth=2)\n", "plt.plot([-5, 5], [0, 0], 'k-')\n", "plt.plot([-5, 5], [-1.758, -1.758], 'k--')\n", "plt.plot([0, 0], [-2.2, 3.2], 'k-')\n", "plt.grid(True)\n", "plt.title(\"SELU activation function\", fontsize=14)\n", "plt.axis([-5, 5, -2.2, 3.2])\n", "\n", "save_fig(\"selu_plot\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, the SELU hyperparameters (`scale` and `alpha`) are tuned in such a way that the mean output of each neuron remains close to 0, and the standard deviation remains close to 1 (assuming the inputs are standardized with mean 0 and standard deviation 1 too). Using this activation function, even a 1,000 layer deep neural network preserves roughly mean 0 and standard deviation 1 across all layers, avoiding the exploding/vanishing gradients problem:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layer 0: mean -0.00, std deviation 1.00\n", "Layer 100: mean 0.02, std deviation 0.96\n", "Layer 200: mean 0.01, std deviation 0.90\n", "Layer 300: mean -0.02, std deviation 0.92\n", "Layer 400: mean 0.05, std deviation 0.89\n", "Layer 500: mean 0.01, std deviation 0.93\n", "Layer 600: mean 0.02, std deviation 0.92\n", "Layer 700: mean -0.02, std deviation 0.90\n", "Layer 800: mean 0.05, std deviation 0.83\n", "Layer 900: mean 0.02, std deviation 1.00\n" ] } ], "source": [ "np.random.seed(42)\n", "Z = np.random.normal(size=(500, 100)) # standardized inputs\n", "for layer in range(1000):\n", " W = np.random.normal(size=(100, 100), scale=np.sqrt(1 / 100)) # LeCun initialization\n", " Z = selu(np.dot(Z, W))\n", " means = np.mean(Z, axis=0).mean()\n", " stds = np.std(Z, axis=0).mean()\n", " if layer % 100 == 0:\n", " print(\"Layer {}: mean {:.2f}, std deviation {:.2f}\".format(layer, means, stds))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using SELU is easy:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "keras.layers.Dense(10, activation=\"selu\",\n", " kernel_initializer=\"lecun_normal\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a neural net for Fashion MNIST with 100 hidden layers, using the SELU activation function:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "np.random.seed(42)\n", "tf.random.set_seed(42)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "model = keras.models.Sequential()\n", "model.add(keras.layers.Flatten(input_shape=[28, 28]))\n", "model.add(keras.layers.Dense(300, activation=\"selu\",\n", " kernel_initializer=\"lecun_normal\"))\n", "for layer in range(99):\n", " model.add(keras.layers.Dense(100, activation=\"selu\",\n", " kernel_initializer=\"lecun_normal\"))\n", "model.add(keras.layers.Dense(10, activation=\"softmax\"))" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "model.compile(loss=\"sparse_categorical_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=1e-3),\n", " metrics=[\"accuracy\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's train it. Do not forget to scale the inputs to mean 0 and standard deviation 1:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "pixel_means = X_train.mean(axis=0, keepdims=True)\n", "pixel_stds = X_train.std(axis=0, keepdims=True)\n", "X_train_scaled = (X_train - pixel_means) / pixel_stds\n", "X_valid_scaled = (X_valid - pixel_means) / pixel_stds\n", "X_test_scaled = (X_test - pixel_means) / pixel_stds" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/5\n", "55000/55000 [==============================] - 35s 644us/sample - loss: 1.0197 - accuracy: 0.6154 - val_loss: 0.7386 - val_accuracy: 0.7348\n", "Epoch 2/5\n", "55000/55000 [==============================] - 33s 607us/sample - loss: 0.7149 - accuracy: 0.7401 - val_loss: 0.6187 - val_accuracy: 0.7774\n", "Epoch 3/5\n", "55000/55000 [==============================] - 32s 583us/sample - loss: 0.6193 - accuracy: 0.7803 - val_loss: 0.5926 - val_accuracy: 0.8036\n", "Epoch 4/5\n", "55000/55000 [==============================] - 32s 586us/sample - loss: 0.5555 - accuracy: 0.8043 - val_loss: 0.5208 - val_accuracy: 0.8262\n", "Epoch 5/5\n", "55000/55000 [==============================] - 32s 573us/sample - loss: 0.5159 - accuracy: 0.8238 - val_loss: 0.4790 - val_accuracy: 0.8358\n" ] } ], "source": [ "history = model.fit(X_train_scaled, y_train, epochs=5,\n", " validation_data=(X_valid_scaled, y_valid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now look at what happens if we try to use the ReLU activation function instead:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "np.random.seed(42)\n", "tf.random.set_seed(42)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "model = keras.models.Sequential()\n", "model.add(keras.layers.Flatten(input_shape=[28, 28]))\n", "model.add(keras.layers.Dense(300, activation=\"relu\", kernel_initializer=\"he_normal\"))\n", "for layer in range(99):\n", " model.add(keras.layers.Dense(100, activation=\"relu\", kernel_initializer=\"he_normal\"))\n", "model.add(keras.layers.Dense(10, activation=\"softmax\"))" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "model.compile(loss=\"sparse_categorical_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=1e-3),\n", " metrics=[\"accuracy\"])" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/5\n", "55000/55000 [==============================] - 18s 319us/sample - loss: 1.9174 - accuracy: 0.2242 - val_loss: 1.3856 - val_accuracy: 0.3846\n", "Epoch 2/5\n", "55000/55000 [==============================] - 15s 279us/sample - loss: 1.2147 - accuracy: 0.4750 - val_loss: 1.0691 - val_accuracy: 0.5510\n", "Epoch 3/5\n", "55000/55000 [==============================] - 15s 281us/sample - loss: 0.9576 - accuracy: 0.6025 - val_loss: 0.7688 - val_accuracy: 0.7036\n", "Epoch 4/5\n", "55000/55000 [==============================] - 15s 281us/sample - loss: 0.8116 - accuracy: 0.6762 - val_loss: 0.7276 - val_accuracy: 0.7288\n", "Epoch 5/5\n", "55000/55000 [==============================] - 15s 278us/sample - loss: 0.8167 - accuracy: 0.6862 - val_loss: 0.7697 - val_accuracy: 0.7032\n" ] } ], "source": [ "history = model.fit(X_train_scaled, y_train, epochs=5,\n", " validation_data=(X_valid_scaled, y_valid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Not great at all, we suffered from the vanishing/exploding gradients problem." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Batch Normalization" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.BatchNormalization(),\n", " keras.layers.Dense(300, activation=\"relu\"),\n", " keras.layers.BatchNormalization(),\n", " keras.layers.Dense(100, activation=\"relu\"),\n", " keras.layers.BatchNormalization(),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model: \"sequential_3\"\n", "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "flatten_3 (Flatten) (None, 784) 0 \n", "_________________________________________________________________\n", "batch_normalization_v2 (Batc (None, 784) 3136 \n", "_________________________________________________________________\n", "dense_210 (Dense) (None, 300) 235500 \n", "_________________________________________________________________\n", "batch_normalization_v2_1 (Ba (None, 300) 1200 \n", "_________________________________________________________________\n", "dense_211 (Dense) (None, 100) 30100 \n", "_________________________________________________________________\n", "batch_normalization_v2_2 (Ba (None, 100) 400 \n", "_________________________________________________________________\n", "dense_212 (Dense) (None, 10) 1010 \n", "=================================================================\n", "Total params: 271,346\n", "Trainable params: 268,978\n", "Non-trainable params: 2,368\n", "_________________________________________________________________\n" ] } ], "source": [ "model.summary()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('batch_normalization_v2/gamma:0', True),\n", " ('batch_normalization_v2/beta:0', True),\n", " ('batch_normalization_v2/moving_mean:0', False),\n", " ('batch_normalization_v2/moving_variance:0', False)]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bn1 = model.layers[1]\n", "[(var.name, var.trainable) for var in bn1.variables]" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ListWrapper([, ])" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bn1.updates" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "model.compile(loss=\"sparse_categorical_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=1e-3),\n", " metrics=[\"accuracy\"])" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/10\n", "55000/55000 [==============================] - 5s 85us/sample - loss: 0.8756 - accuracy: 0.7140 - val_loss: 0.5514 - val_accuracy: 0.8212\n", "Epoch 2/10\n", "55000/55000 [==============================] - 4s 74us/sample - loss: 0.5765 - accuracy: 0.8033 - val_loss: 0.4742 - val_accuracy: 0.8436\n", "Epoch 3/10\n", "55000/55000 [==============================] - 4s 75us/sample - loss: 0.5146 - accuracy: 0.8216 - val_loss: 0.4382 - val_accuracy: 0.8530\n", "Epoch 4/10\n", "55000/55000 [==============================] - 4s 75us/sample - loss: 0.4821 - accuracy: 0.8322 - val_loss: 0.4170 - val_accuracy: 0.8604\n", "Epoch 5/10\n", "55000/55000 [==============================] - 4s 75us/sample - loss: 0.4589 - accuracy: 0.8402 - val_loss: 0.4003 - val_accuracy: 0.8658\n", "Epoch 6/10\n", "55000/55000 [==============================] - 4s 75us/sample - loss: 0.4428 - accuracy: 0.8459 - val_loss: 0.3883 - val_accuracy: 0.8698\n", "Epoch 7/10\n", "55000/55000 [==============================] - 4s 78us/sample - loss: 0.4220 - accuracy: 0.8521 - val_loss: 0.3792 - val_accuracy: 0.8720\n", "Epoch 8/10\n", "55000/55000 [==============================] - 4s 77us/sample - loss: 0.4150 - accuracy: 0.8546 - val_loss: 0.3696 - val_accuracy: 0.8754\n", "Epoch 9/10\n", "55000/55000 [==============================] - 4s 77us/sample - loss: 0.4013 - accuracy: 0.8589 - val_loss: 0.3629 - val_accuracy: 0.8746\n", "Epoch 10/10\n", "55000/55000 [==============================] - 4s 74us/sample - loss: 0.3931 - accuracy: 0.8615 - val_loss: 0.3581 - val_accuracy: 0.8766\n" ] } ], "source": [ "history = model.fit(X_train, y_train, epochs=10,\n", " validation_data=(X_valid, y_valid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sometimes applying BN before the activation function works better (there's a debate on this topic). Moreover, the layer before a `BatchNormalization` layer does not need to have bias terms, since the `BatchNormalization` layer some as well, it would be a waste of parameters, so you can set `use_bias=False` when creating those layers:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.BatchNormalization(),\n", " keras.layers.Dense(300, use_bias=False),\n", " keras.layers.BatchNormalization(),\n", " keras.layers.Activation(\"relu\"),\n", " keras.layers.Dense(100, use_bias=False),\n", " keras.layers.Activation(\"relu\"),\n", " keras.layers.BatchNormalization(),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "model.compile(loss=\"sparse_categorical_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=1e-3),\n", " metrics=[\"accuracy\"])" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/10\n", "55000/55000 [==============================] - 5s 89us/sample - loss: 0.8617 - accuracy: 0.7095 - val_loss: 0.5649 - val_accuracy: 0.8102\n", "Epoch 2/10\n", "55000/55000 [==============================] - 4s 76us/sample - loss: 0.5803 - accuracy: 0.8015 - val_loss: 0.4833 - val_accuracy: 0.8344\n", "Epoch 3/10\n", "55000/55000 [==============================] - 4s 79us/sample - loss: 0.5153 - accuracy: 0.8208 - val_loss: 0.4463 - val_accuracy: 0.8462\n", "Epoch 4/10\n", "55000/55000 [==============================] - 4s 76us/sample - loss: 0.4846 - accuracy: 0.8307 - val_loss: 0.4256 - val_accuracy: 0.8530\n", "Epoch 5/10\n", "55000/55000 [==============================] - 4s 79us/sample - loss: 0.4576 - accuracy: 0.8402 - val_loss: 0.4106 - val_accuracy: 0.8590\n", "Epoch 6/10\n", "55000/55000 [==============================] - 4s 77us/sample - loss: 0.4401 - accuracy: 0.8467 - val_loss: 0.3973 - val_accuracy: 0.8610\n", "Epoch 7/10\n", "55000/55000 [==============================] - 4s 78us/sample - loss: 0.4296 - accuracy: 0.8482 - val_loss: 0.3899 - val_accuracy: 0.8650\n", "Epoch 8/10\n", "55000/55000 [==============================] - 4s 76us/sample - loss: 0.4127 - accuracy: 0.8559 - val_loss: 0.3818 - val_accuracy: 0.8658\n", "Epoch 9/10\n", "55000/55000 [==============================] - 4s 78us/sample - loss: 0.4007 - accuracy: 0.8588 - val_loss: 0.3741 - val_accuracy: 0.8682\n", "Epoch 10/10\n", "55000/55000 [==============================] - 4s 79us/sample - loss: 0.3929 - accuracy: 0.8621 - val_loss: 0.3694 - val_accuracy: 0.8734\n" ] } ], "source": [ "history = model.fit(X_train, y_train, epochs=10,\n", " validation_data=(X_valid, y_valid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gradient Clipping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All Keras optimizers accept `clipnorm` or `clipvalue` arguments:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "optimizer = keras.optimizers.SGD(clipvalue=1.0)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "optimizer = keras.optimizers.SGD(clipnorm=1.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reusing Pretrained Layers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reusing a Keras model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's split the fashion MNIST training set in two:\n", "* `X_train_A`: all images of all items except for sandals and shirts (classes 5 and 6).\n", "* `X_train_B`: a much smaller training set of just the first 200 images of sandals or shirts.\n", "\n", "The validation set and the test set are also split this way, but without restricting the number of images.\n", "\n", "We will train a model on set A (classification task with 8 classes), and try to reuse it to tackle set B (binary classification). We hope to transfer a little bit of knowledge from task A to task B, since classes in set A (sneakers, ankle boots, coats, t-shirts, etc.) are somewhat similar to classes in set B (sandals and shirts). However, since we are using `Dense` layers, only patterns that occur at the same location can be reused (in contrast, convolutional layers will transfer much better, since learned patterns can be detected anywhere on the image, as we will see in the CNN chapter)." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "def split_dataset(X, y):\n", " y_5_or_6 = (y == 5) | (y == 6) # sandals or shirts\n", " y_A = y[~y_5_or_6]\n", " y_A[y_A > 6] -= 2 # class indices 7, 8, 9 should be moved to 5, 6, 7\n", " y_B = (y[y_5_or_6] == 6).astype(np.float32) # binary classification task: is it a shirt (class 6)?\n", " return ((X[~y_5_or_6], y_A),\n", " (X[y_5_or_6], y_B))\n", "\n", "(X_train_A, y_train_A), (X_train_B, y_train_B) = split_dataset(X_train, y_train)\n", "(X_valid_A, y_valid_A), (X_valid_B, y_valid_B) = split_dataset(X_valid, y_valid)\n", "(X_test_A, y_test_A), (X_test_B, y_test_B) = split_dataset(X_test, y_test)\n", "X_train_B = X_train_B[:200]\n", "y_train_B = y_train_B[:200]" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(43986, 28, 28)" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_train_A.shape" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(200, 28, 28)" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_train_B.shape" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([4, 0, 5, 7, 7, 7, 4, 4, 3, 4, 0, 1, 6, 3, 4, 3, 2, 6, 5, 3, 4, 5,\n", " 1, 3, 4, 2, 0, 6, 7, 1], dtype=uint8)" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_train_A[:30]" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1., 1., 0., 0., 0., 0., 1., 1., 1., 0., 0., 1., 1., 0., 0., 0., 0.,\n", " 0., 0., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 1.], dtype=float32)" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_train_B[:30]" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "model_A = keras.models.Sequential()\n", "model_A.add(keras.layers.Flatten(input_shape=[28, 28]))\n", "for n_hidden in (300, 100, 50, 50, 50):\n", " model_A.add(keras.layers.Dense(n_hidden, activation=\"selu\"))\n", "model_A.add(keras.layers.Dense(8, activation=\"softmax\"))" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "model_A.compile(loss=\"sparse_categorical_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=1e-3),\n", " metrics=[\"accuracy\"])" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 43986 samples, validate on 4014 samples\n", "Epoch 1/20\n", "43986/43986 [==============================] - 3s 78us/sample - loss: 0.5887 - accuracy: 0.8123 - val_loss: 0.3749 - val_accuracy: 0.8734\n", "Epoch 2/20\n", "43986/43986 [==============================] - 3s 69us/sample - loss: 0.3516 - accuracy: 0.8793 - val_loss: 0.3223 - val_accuracy: 0.8874\n", "Epoch 3/20\n", "43986/43986 [==============================] - 3s 68us/sample - loss: 0.3160 - accuracy: 0.8894 - val_loss: 0.3009 - val_accuracy: 0.8956\n", "Epoch 4/20\n", "43986/43986 [==============================] - 3s 70us/sample - loss: 0.2963 - accuracy: 0.8979 - val_loss: 0.2850 - val_accuracy: 0.9036\n", "Epoch 5/20\n", "43986/43986 [==============================] - 3s 68us/sample - loss: 0.2825 - accuracy: 0.9035 - val_loss: 0.2767 - val_accuracy: 0.9076\n", "Epoch 6/20\n", "43986/43986 [==============================] - 3s 69us/sample - loss: 0.2720 - accuracy: 0.9068 - val_loss: 0.2672 - val_accuracy: 0.9093\n", "Epoch 7/20\n", "43986/43986 [==============================] - 3s 72us/sample - loss: 0.2638 - accuracy: 0.9093 - val_loss: 0.2658 - val_accuracy: 0.9103\n", "Epoch 8/20\n", "43986/43986 [==============================] - 3s 70us/sample - loss: 0.2570 - accuracy: 0.9120 - val_loss: 0.2592 - val_accuracy: 0.9106\n", "Epoch 9/20\n", "43986/43986 [==============================] - 3s 71us/sample - loss: 0.2514 - accuracy: 0.9139 - val_loss: 0.2570 - val_accuracy: 0.9128\n", "Epoch 10/20\n", "43986/43986 [==============================] - 3s 72us/sample - loss: 0.2465 - accuracy: 0.9166 - val_loss: 0.2557 - val_accuracy: 0.9108\n", "Epoch 11/20\n", "43986/43986 [==============================] - 3s 69us/sample - loss: 0.2418 - accuracy: 0.9178 - val_loss: 0.2484 - val_accuracy: 0.9178\n", "Epoch 12/20\n", "43986/43986 [==============================] - 3s 70us/sample - loss: 0.2379 - accuracy: 0.9192 - val_loss: 0.2461 - val_accuracy: 0.9178\n", "Epoch 13/20\n", "43986/43986 [==============================] - 3s 71us/sample - loss: 0.2342 - accuracy: 0.9199 - val_loss: 0.2425 - val_accuracy: 0.9188\n", "Epoch 14/20\n", "43986/43986 [==============================] - 3s 68us/sample - loss: 0.2313 - accuracy: 0.9215 - val_loss: 0.2412 - val_accuracy: 0.9185\n", "Epoch 15/20\n", "43986/43986 [==============================] - 3s 68us/sample - loss: 0.2280 - accuracy: 0.9222 - val_loss: 0.2382 - val_accuracy: 0.9173\n", "Epoch 16/20\n", "43986/43986 [==============================] - 3s 71us/sample - loss: 0.2252 - accuracy: 0.9224 - val_loss: 0.2360 - val_accuracy: 0.9205\n", "Epoch 17/20\n", "43986/43986 [==============================] - 3s 71us/sample - loss: 0.2229 - accuracy: 0.9232 - val_loss: 0.2419 - val_accuracy: 0.9158\n", "Epoch 18/20\n", "43986/43986 [==============================] - 3s 71us/sample - loss: 0.2195 - accuracy: 0.9249 - val_loss: 0.2357 - val_accuracy: 0.9170\n", "Epoch 19/20\n", "43986/43986 [==============================] - 3s 68us/sample - loss: 0.2177 - accuracy: 0.9254 - val_loss: 0.2331 - val_accuracy: 0.9200\n", "Epoch 20/20\n", "43986/43986 [==============================] - 3s 70us/sample - loss: 0.2154 - accuracy: 0.9260 - val_loss: 0.2372 - val_accuracy: 0.9158\n" ] } ], "source": [ "history = model_A.fit(X_train_A, y_train_A, epochs=20,\n", " validation_data=(X_valid_A, y_valid_A))" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "model_A.save(\"my_model_A.h5\")" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "model_B = keras.models.Sequential()\n", "model_B.add(keras.layers.Flatten(input_shape=[28, 28]))\n", "for n_hidden in (300, 100, 50, 50, 50):\n", " model_B.add(keras.layers.Dense(n_hidden, activation=\"selu\"))\n", "model_B.add(keras.layers.Dense(1, activation=\"sigmoid\"))" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "model_B.compile(loss=\"binary_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=1e-3),\n", " metrics=[\"accuracy\"])" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 200 samples, validate on 986 samples\n", "Epoch 1/20\n", "200/200 [==============================] - 0s 2ms/sample - loss: 0.9537 - accuracy: 0.4800 - val_loss: 0.6472 - val_accuracy: 0.5710\n", "Epoch 2/20\n", "200/200 [==============================] - 0s 318us/sample - loss: 0.5805 - accuracy: 0.6850 - val_loss: 0.4863 - val_accuracy: 0.8428\n", "Epoch 3/20\n", "200/200 [==============================] - 0s 318us/sample - loss: 0.4561 - accuracy: 0.8750 - val_loss: 0.4116 - val_accuracy: 0.8905\n", "Epoch 4/20\n", "200/200 [==============================] - 0s 308us/sample - loss: 0.3885 - accuracy: 0.9100 - val_loss: 0.3650 - val_accuracy: 0.9148\n", "Epoch 5/20\n", "200/200 [==============================] - 0s 311us/sample - loss: 0.3426 - accuracy: 0.9250 - val_loss: 0.3308 - val_accuracy: 0.9270\n", "Epoch 6/20\n", "200/200 [==============================] - 0s 317us/sample - loss: 0.3084 - accuracy: 0.9300 - val_loss: 0.3044 - val_accuracy: 0.9371\n", "Epoch 7/20\n", "200/200 [==============================] - 0s 309us/sample - loss: 0.2810 - accuracy: 0.9400 - val_loss: 0.2806 - val_accuracy: 0.9432\n", "Epoch 8/20\n", "200/200 [==============================] - 0s 313us/sample - loss: 0.2572 - accuracy: 0.9500 - val_loss: 0.2607 - val_accuracy: 0.9462\n", "Epoch 9/20\n", "200/200 [==============================] - 0s 312us/sample - loss: 0.2372 - accuracy: 0.9600 - val_loss: 0.2439 - val_accuracy: 0.9513\n", "Epoch 10/20\n", "200/200 [==============================] - 0s 319us/sample - loss: 0.2202 - accuracy: 0.9600 - val_loss: 0.2290 - val_accuracy: 0.9523\n", "Epoch 11/20\n", "200/200 [==============================] - 0s 315us/sample - loss: 0.2047 - accuracy: 0.9650 - val_loss: 0.2161 - val_accuracy: 0.9564\n", "Epoch 12/20\n", "200/200 [==============================] - 0s 325us/sample - loss: 0.1917 - accuracy: 0.9700 - val_loss: 0.2046 - val_accuracy: 0.9584\n", "Epoch 13/20\n", "200/200 [==============================] - 0s 335us/sample - loss: 0.1798 - accuracy: 0.9750 - val_loss: 0.1944 - val_accuracy: 0.9604\n", "Epoch 14/20\n", "200/200 [==============================] - 0s 319us/sample - loss: 0.1690 - accuracy: 0.9750 - val_loss: 0.1860 - val_accuracy: 0.9604\n", "Epoch 15/20\n", "200/200 [==============================] - 0s 319us/sample - loss: 0.1594 - accuracy: 0.9850 - val_loss: 0.1774 - val_accuracy: 0.9635\n", "Epoch 16/20\n", "200/200 [==============================] - 0s 343us/sample - loss: 0.1508 - accuracy: 0.9850 - val_loss: 0.1691 - val_accuracy: 0.9675\n", "Epoch 17/20\n", "200/200 [==============================] - 0s 328us/sample - loss: 0.1426 - accuracy: 0.9900 - val_loss: 0.1621 - val_accuracy: 0.9686\n", "Epoch 18/20\n", "200/200 [==============================] - 0s 340us/sample - loss: 0.1355 - accuracy: 0.9900 - val_loss: 0.1558 - val_accuracy: 0.9706\n", "Epoch 19/20\n", "200/200 [==============================] - 0s 306us/sample - loss: 0.1288 - accuracy: 0.9900 - val_loss: 0.1505 - val_accuracy: 0.9706\n", "Epoch 20/20\n", "200/200 [==============================] - 0s 312us/sample - loss: 0.1230 - accuracy: 0.9900 - val_loss: 0.1454 - val_accuracy: 0.9716\n" ] } ], "source": [ "history = model_B.fit(X_train_B, y_train_B, epochs=20,\n", " validation_data=(X_valid_B, y_valid_B))" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model: \"sequential_4\"\n", "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "flatten_4 (Flatten) (None, 784) 0 \n", "_________________________________________________________________\n", "batch_normalization_v2_3 (Ba (None, 784) 3136 \n", "_________________________________________________________________\n", "dense_213 (Dense) (None, 300) 235500 \n", "_________________________________________________________________\n", "batch_normalization_v2_4 (Ba (None, 300) 1200 \n", "_________________________________________________________________\n", "activation (Activation) (None, 300) 0 \n", "_________________________________________________________________\n", "dense_214 (Dense) (None, 100) 30100 \n", "_________________________________________________________________\n", "activation_1 (Activation) (None, 100) 0 \n", "_________________________________________________________________\n", "batch_normalization_v2_5 (Ba (None, 100) 400 \n", "_________________________________________________________________\n", "dense_215 (Dense) (None, 10) 1010 \n", "=================================================================\n", "Total params: 271,346\n", "Trainable params: 268,978\n", "Non-trainable params: 2,368\n", "_________________________________________________________________\n" ] } ], "source": [ "model.summary()" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "model_A = keras.models.load_model(\"my_model_A.h5\")\n", "model_B_on_A = keras.models.Sequential(model_A.layers[:-1])\n", "model_B_on_A.add(keras.layers.Dense(1, activation=\"sigmoid\"))" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "model_A_clone = keras.models.clone_model(model_A)\n", "model_A_clone.set_weights(model_A.get_weights())" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "for layer in model_B_on_A.layers[:-1]:\n", " layer.trainable = False\n", "\n", "model_B_on_A.compile(loss=\"binary_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=1e-3),\n", " metrics=[\"accuracy\"])" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 200 samples, validate on 986 samples\n", "Epoch 1/4\n", "200/200 [==============================] - 0s 2ms/sample - loss: 0.5851 - accuracy: 0.6600 - val_loss: 0.5855 - val_accuracy: 0.6318\n", "Epoch 2/4\n", "200/200 [==============================] - 0s 303us/sample - loss: 0.5484 - accuracy: 0.6850 - val_loss: 0.5484 - val_accuracy: 0.6775\n", "Epoch 3/4\n", "200/200 [==============================] - 0s 294us/sample - loss: 0.5116 - accuracy: 0.7250 - val_loss: 0.5141 - val_accuracy: 0.7160\n", "Epoch 4/4\n", "200/200 [==============================] - 0s 316us/sample - loss: 0.4779 - accuracy: 0.7450 - val_loss: 0.4859 - val_accuracy: 0.7363\n", "Train on 200 samples, validate on 986 samples\n", "Epoch 1/16\n", "200/200 [==============================] - 0s 2ms/sample - loss: 0.3989 - accuracy: 0.8050 - val_loss: 0.3419 - val_accuracy: 0.8702\n", "Epoch 2/16\n", "200/200 [==============================] - 0s 328us/sample - loss: 0.2795 - accuracy: 0.9300 - val_loss: 0.2624 - val_accuracy: 0.9280\n", "Epoch 3/16\n", "200/200 [==============================] - 0s 319us/sample - loss: 0.2128 - accuracy: 0.9650 - val_loss: 0.2150 - val_accuracy: 0.9544\n", "Epoch 4/16\n", "200/200 [==============================] - 0s 318us/sample - loss: 0.1720 - accuracy: 0.9800 - val_loss: 0.1826 - val_accuracy: 0.9635\n", "Epoch 5/16\n", "200/200 [==============================] - 0s 317us/sample - loss: 0.1436 - accuracy: 0.9800 - val_loss: 0.1586 - val_accuracy: 0.9736\n", "Epoch 6/16\n", "200/200 [==============================] - 0s 317us/sample - loss: 0.1231 - accuracy: 0.9850 - val_loss: 0.1407 - val_accuracy: 0.9807\n", "Epoch 7/16\n", "200/200 [==============================] - 0s 325us/sample - loss: 0.1074 - accuracy: 0.9900 - val_loss: 0.1270 - val_accuracy: 0.9828\n", "Epoch 8/16\n", "200/200 [==============================] - 0s 326us/sample - loss: 0.0953 - accuracy: 0.9950 - val_loss: 0.1158 - val_accuracy: 0.9848\n", "Epoch 9/16\n", "200/200 [==============================] - 0s 319us/sample - loss: 0.0854 - accuracy: 1.0000 - val_loss: 0.1076 - val_accuracy: 0.9878\n", "Epoch 10/16\n", "200/200 [==============================] - 0s 322us/sample - loss: 0.0781 - accuracy: 1.0000 - val_loss: 0.1007 - val_accuracy: 0.9888\n", "Epoch 11/16\n", "200/200 [==============================] - 0s 316us/sample - loss: 0.0718 - accuracy: 1.0000 - val_loss: 0.0944 - val_accuracy: 0.9888\n", "Epoch 12/16\n", "200/200 [==============================] - 0s 319us/sample - loss: 0.0662 - accuracy: 1.0000 - val_loss: 0.0891 - val_accuracy: 0.9899\n", "Epoch 13/16\n", "200/200 [==============================] - 0s 318us/sample - loss: 0.0613 - accuracy: 1.0000 - val_loss: 0.0846 - val_accuracy: 0.9899\n", "Epoch 14/16\n", "200/200 [==============================] - 0s 332us/sample - loss: 0.0574 - accuracy: 1.0000 - val_loss: 0.0806 - val_accuracy: 0.9899\n", "Epoch 15/16\n", "200/200 [==============================] - 0s 320us/sample - loss: 0.0538 - accuracy: 1.0000 - val_loss: 0.0770 - val_accuracy: 0.9899\n", "Epoch 16/16\n", "200/200 [==============================] - 0s 320us/sample - loss: 0.0505 - accuracy: 1.0000 - val_loss: 0.0740 - val_accuracy: 0.9899\n" ] } ], "source": [ "history = model_B_on_A.fit(X_train_B, y_train_B, epochs=4,\n", " validation_data=(X_valid_B, y_valid_B))\n", "\n", "for layer in model_B_on_A.layers[:-1]:\n", " layer.trainable = True\n", "\n", "model_B_on_A.compile(loss=\"binary_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=1e-3),\n", " metrics=[\"accuracy\"])\n", "history = model_B_on_A.fit(X_train_B, y_train_B, epochs=16,\n", " validation_data=(X_valid_B, y_valid_B))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, what's the final verdict?" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2000/2000 [==============================] - 0s 41us/sample - loss: 0.1431 - accuracy: 0.9705\n" ] }, { "data": { "text/plain": [ "[0.1430660070180893, 0.9705]" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_B.evaluate(X_test_B, y_test_B)" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2000/2000 [==============================] - 0s 38us/sample - loss: 0.0689 - accuracy: 0.9925\n" ] }, { "data": { "text/plain": [ "[0.06887910133600235, 0.9925]" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_B_on_A.evaluate(X_test_B, y_test_B)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great! We got quite a bit of transfer: the error rate dropped by a factor of almost 4!" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.933333333333337" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(100 - 97.05) / (100 - 99.25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Faster Optimizers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Momentum optimization" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [], "source": [ "optimizer = keras.optimizers.SGD(lr=0.001, momentum=0.9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nesterov Accelerated Gradient" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [], "source": [ "optimizer = keras.optimizers.SGD(lr=0.001, momentum=0.9, nesterov=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## AdaGrad" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "optimizer = keras.optimizers.Adagrad(lr=0.001)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## RMSProp" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [], "source": [ "optimizer = keras.optimizers.RMSprop(lr=0.001, rho=0.9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adam Optimization" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [], "source": [ "optimizer = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adamax Optimization" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [], "source": [ "optimizer = keras.optimizers.Adamax(lr=0.001, beta_1=0.9, beta_2=0.999)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nadam Optimization" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "optimizer = keras.optimizers.Nadam(lr=0.001, beta_1=0.9, beta_2=0.999)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning Rate Scheduling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Power Scheduling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```lr = lr0 / (1 + steps / s)**c```\n", "* Keras uses `c=1` and `s = 1 / decay`" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [], "source": [ "optimizer = keras.optimizers.SGD(lr=0.01, decay=1e-4)" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [], "source": [ "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])\n", "model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer, metrics=[\"accuracy\"])" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/25\n", "55000/55000 [==============================] - 4s 66us/sample - loss: 0.4840 - accuracy: 0.8296 - val_loss: 0.4038 - val_accuracy: 0.8630\n", "Epoch 2/25\n", "55000/55000 [==============================] - 3s 63us/sample - loss: 0.3787 - accuracy: 0.8653 - val_loss: 0.3846 - val_accuracy: 0.8706\n", "Epoch 3/25\n", "55000/55000 [==============================] - 3s 62us/sample - loss: 0.3461 - accuracy: 0.8770 - val_loss: 0.3606 - val_accuracy: 0.8776\n", "Epoch 4/25\n", "55000/55000 [==============================] - 3s 63us/sample - loss: 0.3248 - accuracy: 0.8844 - val_loss: 0.3661 - val_accuracy: 0.8738\n", "Epoch 5/25\n", "55000/55000 [==============================] - 3s 62us/sample - loss: 0.3092 - accuracy: 0.8902 - val_loss: 0.3516 - val_accuracy: 0.8792\n", "Epoch 6/25\n", "55000/55000 [==============================] - 3s 63us/sample - loss: 0.2967 - accuracy: 0.8938 - val_loss: 0.3467 - val_accuracy: 0.8810\n", "Epoch 7/25\n", "55000/55000 [==============================] - 3s 63us/sample - loss: 0.2862 - accuracy: 0.8967 - val_loss: 0.3398 - val_accuracy: 0.8844\n", "Epoch 8/25\n", "55000/55000 [==============================] - 3s 61us/sample - loss: 0.2771 - accuracy: 0.8997 - val_loss: 0.3384 - val_accuracy: 0.8832\n", "Epoch 9/25\n", "55000/55000 [==============================] - 3s 62us/sample - loss: 0.2696 - accuracy: 0.9035 - val_loss: 0.3345 - val_accuracy: 0.8860\n", "Epoch 10/25\n", "55000/55000 [==============================] - 3s 62us/sample - loss: 0.2628 - accuracy: 0.9057 - val_loss: 0.3343 - val_accuracy: 0.8830\n", "Epoch 11/25\n", "55000/55000 [==============================] - 3s 61us/sample - loss: 0.2568 - accuracy: 0.9083 - val_loss: 0.3290 - val_accuracy: 0.8882\n", "Epoch 12/25\n", "55000/55000 [==============================] - 3s 62us/sample - loss: 0.2510 - accuracy: 0.9099 - val_loss: 0.3243 - val_accuracy: 0.8904\n", "Epoch 13/25\n", "55000/55000 [==============================] - 3s 61us/sample - loss: 0.2459 - accuracy: 0.9118 - val_loss: 0.3271 - val_accuracy: 0.8874\n", "Epoch 14/25\n", "55000/55000 [==============================] - 3s 62us/sample - loss: 0.2415 - accuracy: 0.9130 - val_loss: 0.3259 - val_accuracy: 0.8886\n", "Epoch 15/25\n", "55000/55000 [==============================] - 3s 62us/sample - loss: 0.2370 - accuracy: 0.9157 - val_loss: 0.3249 - val_accuracy: 0.8896\n", "Epoch 16/25\n", "55000/55000 [==============================] - 3s 61us/sample - loss: 0.2332 - accuracy: 0.9177 - val_loss: 0.3267 - val_accuracy: 0.8892\n", "Epoch 17/25\n", "55000/55000 [==============================] - 3s 63us/sample - loss: 0.2296 - accuracy: 0.9177 - val_loss: 0.3251 - val_accuracy: 0.8880\n", "Epoch 18/25\n", "55000/55000 [==============================] - 3s 61us/sample - loss: 0.2257 - accuracy: 0.9194 - val_loss: 0.3221 - val_accuracy: 0.8900\n", "Epoch 19/25\n", "55000/55000 [==============================] - 3s 61us/sample - loss: 0.2228 - accuracy: 0.9212 - val_loss: 0.3237 - val_accuracy: 0.8910\n", "Epoch 20/25\n", "55000/55000 [==============================] - 3s 60us/sample - loss: 0.2198 - accuracy: 0.9223 - val_loss: 0.3217 - val_accuracy: 0.8904\n", "Epoch 21/25\n", "55000/55000 [==============================] - 3s 63us/sample - loss: 0.2166 - accuracy: 0.9238 - val_loss: 0.3185 - val_accuracy: 0.8938\n", "Epoch 22/25\n", "55000/55000 [==============================] - 3s 61us/sample - loss: 0.2140 - accuracy: 0.9252 - val_loss: 0.3212 - val_accuracy: 0.8902\n", "Epoch 23/25\n", "55000/55000 [==============================] - 3s 62us/sample - loss: 0.2113 - accuracy: 0.9256 - val_loss: 0.3235 - val_accuracy: 0.8898\n", "Epoch 24/25\n", "55000/55000 [==============================] - 3s 62us/sample - loss: 0.2088 - accuracy: 0.9262 - val_loss: 0.3216 - val_accuracy: 0.8930\n", "Epoch 25/25\n", "55000/55000 [==============================] - 3s 62us/sample - loss: 0.2061 - accuracy: 0.9273 - val_loss: 0.3199 - val_accuracy: 0.8922\n" ] } ], "source": [ "n_epochs = 25\n", "history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n", " validation_data=(X_valid_scaled, y_valid))" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "learning_rate = 0.01\n", "decay = 1e-4\n", "batch_size = 32\n", "n_steps_per_epoch = len(X_train) // batch_size\n", "epochs = np.arange(n_epochs)\n", "lrs = learning_rate / (1 + decay * epochs * n_steps_per_epoch)\n", "\n", "plt.plot(epochs, lrs, \"o-\")\n", "plt.axis([0, n_epochs - 1, 0, 0.01])\n", "plt.xlabel(\"Epoch\")\n", "plt.ylabel(\"Learning Rate\")\n", "plt.title(\"Power Scheduling\", fontsize=14)\n", "plt.grid(True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exponential Scheduling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```lr = lr0 * 0.1**(epoch / s)```" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [], "source": [ "def exponential_decay_fn(epoch):\n", " return 0.01 * 0.1**(epoch / 20)" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [], "source": [ "def exponential_decay(lr0, s):\n", " def exponential_decay_fn(epoch):\n", " return lr0 * 0.1**(epoch / s)\n", " return exponential_decay_fn\n", "\n", "exponential_decay_fn = exponential_decay(lr0=0.01, s=20)" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [], "source": [ "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])\n", "model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n", "n_epochs = 25" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/25\n", "55000/55000 [==============================] - 6s 107us/sample - loss: 0.8245 - accuracy: 0.7595 - val_loss: 1.0870 - val_accuracy: 0.7106\n", "Epoch 2/25\n", "55000/55000 [==============================] - 6s 101us/sample - loss: 0.6391 - accuracy: 0.8064 - val_loss: 0.6125 - val_accuracy: 0.8160\n", "Epoch 3/25\n", "55000/55000 [==============================] - 6s 101us/sample - loss: 0.5962 - accuracy: 0.8174 - val_loss: 0.6526 - val_accuracy: 0.8086\n", "Epoch 4/25\n", "55000/55000 [==============================] - 5s 99us/sample - loss: 0.5420 - accuracy: 0.8306 - val_loss: 0.7521 - val_accuracy: 0.7766\n", "Epoch 5/25\n", "55000/55000 [==============================] - 5s 100us/sample - loss: 0.4853 - accuracy: 0.8460 - val_loss: 0.5616 - val_accuracy: 0.8314\n", "Epoch 6/25\n", "55000/55000 [==============================] - 5s 98us/sample - loss: 0.4443 - accuracy: 0.8571 - val_loss: 0.5430 - val_accuracy: 0.8664\n", "Epoch 7/25\n", "55000/55000 [==============================] - 5s 99us/sample - loss: 0.4128 - accuracy: 0.8687 - val_loss: 0.4954 - val_accuracy: 0.8610\n", "Epoch 8/25\n", "55000/55000 [==============================] - 6s 100us/sample - loss: 0.3763 - accuracy: 0.8773 - val_loss: 0.5770 - val_accuracy: 0.8578\n", "Epoch 9/25\n", "55000/55000 [==============================] - 6s 102us/sample - loss: 0.3459 - accuracy: 0.8847 - val_loss: 0.5267 - val_accuracy: 0.8688\n", "Epoch 10/25\n", "55000/55000 [==============================] - 5s 99us/sample - loss: 0.3250 - accuracy: 0.8931 - val_loss: 0.4606 - val_accuracy: 0.8644\n", "Epoch 11/25\n", "55000/55000 [==============================] - 5s 97us/sample - loss: 0.2984 - accuracy: 0.9010 - val_loss: 0.5083 - val_accuracy: 0.8610\n", "Epoch 12/25\n", "55000/55000 [==============================] - 5s 99us/sample - loss: 0.2736 - accuracy: 0.9080 - val_loss: 0.4497 - val_accuracy: 0.8826\n", "Epoch 13/25\n", "55000/55000 [==============================] - 5s 99us/sample - loss: 0.2603 - accuracy: 0.9128 - val_loss: 0.4366 - val_accuracy: 0.8808\n", "Epoch 14/25\n", "55000/55000 [==============================] - 5s 100us/sample - loss: 0.2382 - accuracy: 0.9197 - val_loss: 0.4692 - val_accuracy: 0.8828\n", "Epoch 15/25\n", "55000/55000 [==============================] - 6s 102us/sample - loss: 0.2240 - accuracy: 0.9252 - val_loss: 0.4609 - val_accuracy: 0.8774\n", "Epoch 16/25\n", "55000/55000 [==============================] - 5s 99us/sample - loss: 0.2020 - accuracy: 0.9306 - val_loss: 0.4950 - val_accuracy: 0.8808\n", "Epoch 17/25\n", "55000/55000 [==============================] - 5s 100us/sample - loss: 0.1950 - accuracy: 0.9340 - val_loss: 0.4985 - val_accuracy: 0.8856\n", "Epoch 18/25\n", "55000/55000 [==============================] - 6s 102us/sample - loss: 0.1785 - accuracy: 0.9388 - val_loss: 0.5071 - val_accuracy: 0.8854\n", "Epoch 19/25\n", "55000/55000 [==============================] - 5s 100us/sample - loss: 0.1649 - accuracy: 0.9447 - val_loss: 0.4798 - val_accuracy: 0.8890\n", "Epoch 20/25\n", "55000/55000 [==============================] - 5s 100us/sample - loss: 0.1561 - accuracy: 0.9471 - val_loss: 0.5023 - val_accuracy: 0.8896\n", "Epoch 21/25\n", "55000/55000 [==============================] - 5s 98us/sample - loss: 0.1442 - accuracy: 0.9520 - val_loss: 0.5253 - val_accuracy: 0.8952\n", "Epoch 22/25\n", "55000/55000 [==============================] - 5s 99us/sample - loss: 0.1369 - accuracy: 0.9540 - val_loss: 0.5558 - val_accuracy: 0.8922\n", "Epoch 23/25\n", "55000/55000 [==============================] - 5s 98us/sample - loss: 0.1277 - accuracy: 0.9576 - val_loss: 0.5786 - val_accuracy: 0.8908\n", "Epoch 24/25\n", "55000/55000 [==============================] - 5s 99us/sample - loss: 0.1204 - accuracy: 0.9611 - val_loss: 0.5991 - val_accuracy: 0.8902\n", "Epoch 25/25\n", "55000/55000 [==============================] - 6s 102us/sample - loss: 0.1130 - accuracy: 0.9638 - val_loss: 0.5984 - val_accuracy: 0.8894\n" ] } ], "source": [ "lr_scheduler = keras.callbacks.LearningRateScheduler(exponential_decay_fn)\n", "history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n", " validation_data=(X_valid_scaled, y_valid),\n", " callbacks=[lr_scheduler])" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(history.epoch, history.history[\"lr\"], \"o-\")\n", "plt.axis([0, n_epochs - 1, 0, 0.011])\n", "plt.xlabel(\"Epoch\")\n", "plt.ylabel(\"Learning Rate\")\n", "plt.title(\"Exponential Scheduling\", fontsize=14)\n", "plt.grid(True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The schedule function can take the current learning rate as a second argument:" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [], "source": [ "def exponential_decay_fn(epoch, lr):\n", " return lr * 0.1**(1 / 20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to update the learning rate at each iteration rather than at each epoch, you must write your own callback class:" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/25\n", "55000/55000 [==============================] - 7s 132us/sample - loss: 0.8067 - accuracy: 0.7678 - val_loss: 0.7942 - val_accuracy: 0.7780\n", "Epoch 2/25\n", "55000/55000 [==============================] - 7s 122us/sample - loss: 0.6784 - accuracy: 0.7937 - val_loss: 0.8375 - val_accuracy: 0.8120\n", "Epoch 3/25\n", "55000/55000 [==============================] - 6s 114us/sample - loss: 0.6060 - accuracy: 0.8148 - val_loss: 0.6303 - val_accuracy: 0.8304\n", "Epoch 4/25\n", "55000/55000 [==============================] - 6s 114us/sample - loss: 0.5279 - accuracy: 0.8341 - val_loss: 0.5724 - val_accuracy: 0.8196\n", "Epoch 5/25\n", "55000/55000 [==============================] - 6s 112us/sample - loss: 0.4803 - accuracy: 0.8486 - val_loss: 0.5488 - val_accuracy: 0.8486\n", "Epoch 6/25\n", "55000/55000 [==============================] - 6s 113us/sample - loss: 0.4305 - accuracy: 0.8611 - val_loss: 0.4778 - val_accuracy: 0.8470\n", "Epoch 7/25\n", "55000/55000 [==============================] - 6s 112us/sample - loss: 0.3969 - accuracy: 0.8699 - val_loss: 0.4922 - val_accuracy: 0.8584\n", "Epoch 8/25\n", "55000/55000 [==============================] - 6s 111us/sample - loss: 0.3799 - accuracy: 0.8777 - val_loss: 0.5417 - val_accuracy: 0.8614\n", "Epoch 9/25\n", "55000/55000 [==============================] - 6s 111us/sample - loss: 0.3475 - accuracy: 0.8851 - val_loss: 0.5032 - val_accuracy: 0.8734\n", "Epoch 10/25\n", "55000/55000 [==============================] - 6s 110us/sample - loss: 0.3256 - accuracy: 0.8937 - val_loss: 0.4433 - val_accuracy: 0.8802\n", "Epoch 11/25\n", "55000/55000 [==============================] - 6s 110us/sample - loss: 0.2944 - accuracy: 0.9017 - val_loss: 0.4888 - val_accuracy: 0.8742\n", "Epoch 12/25\n", "55000/55000 [==============================] - 6s 110us/sample - loss: 0.2767 - accuracy: 0.9077 - val_loss: 0.4626 - val_accuracy: 0.8706\n", "Epoch 13/25\n", "55000/55000 [==============================] - 6s 111us/sample - loss: 0.2572 - accuracy: 0.9134 - val_loss: 0.4750 - val_accuracy: 0.8770\n", "Epoch 14/25\n", "55000/55000 [==============================] - 6s 111us/sample - loss: 0.2391 - accuracy: 0.9185 - val_loss: 0.4633 - val_accuracy: 0.8900\n", "Epoch 15/25\n", "55000/55000 [==============================] - 6s 112us/sample - loss: 0.2180 - accuracy: 0.9251 - val_loss: 0.4573 - val_accuracy: 0.8768\n", "Epoch 16/25\n", "55000/55000 [==============================] - 6s 110us/sample - loss: 0.2029 - accuracy: 0.9311 - val_loss: 0.4748 - val_accuracy: 0.8840\n", "Epoch 17/25\n", "55000/55000 [==============================] - 6s 112us/sample - loss: 0.1884 - accuracy: 0.9357 - val_loss: 0.5171 - val_accuracy: 0.8840\n", "Epoch 18/25\n", "55000/55000 [==============================] - 6s 111us/sample - loss: 0.1813 - accuracy: 0.9382 - val_loss: 0.5293 - val_accuracy: 0.8822\n", "Epoch 19/25\n", "55000/55000 [==============================] - 6s 112us/sample - loss: 0.1618 - accuracy: 0.9445 - val_loss: 0.5328 - val_accuracy: 0.8872\n", "Epoch 20/25\n", "55000/55000 [==============================] - 6s 111us/sample - loss: 0.1570 - accuracy: 0.9483 - val_loss: 0.5453 - val_accuracy: 0.8870\n", "Epoch 21/25\n", "55000/55000 [==============================] - 6s 112us/sample - loss: 0.1422 - accuracy: 0.9523 - val_loss: 0.5596 - val_accuracy: 0.8892\n", "Epoch 22/25\n", "55000/55000 [==============================] - 6s 111us/sample - loss: 0.1329 - accuracy: 0.9563 - val_loss: 0.5717 - val_accuracy: 0.8894\n", "Epoch 23/25\n", "55000/55000 [==============================] - 6s 110us/sample - loss: 0.1248 - accuracy: 0.9592 - val_loss: 0.5959 - val_accuracy: 0.8930\n", "Epoch 24/25\n", "55000/55000 [==============================] - 6s 112us/sample - loss: 0.1178 - accuracy: 0.9606 - val_loss: 0.5875 - val_accuracy: 0.8896\n", "Epoch 25/25\n", "55000/55000 [==============================] - 6s 111us/sample - loss: 0.1103 - accuracy: 0.9646 - val_loss: 0.6103 - val_accuracy: 0.8904\n" ] } ], "source": [ "K = keras.backend\n", "\n", "class ExponentialDecay(keras.callbacks.Callback):\n", " def __init__(self, s=40000):\n", " super().__init__()\n", " self.s = s\n", "\n", " def on_batch_begin(self, batch, logs=None):\n", " # Note: the `batch` argument is reset at each epoch\n", " lr = K.get_value(self.model.optimizer.lr)\n", " K.set_value(self.model.optimizer.lr, lr * 0.1**(1 / s))\n", "\n", " def on_epoch_end(self, epoch, logs=None):\n", " logs = logs or {}\n", " logs['lr'] = K.get_value(self.model.optimizer.lr)\n", "\n", "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])\n", "lr0 = 0.01\n", "optimizer = keras.optimizers.Nadam(lr=lr0)\n", "model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer, metrics=[\"accuracy\"])\n", "n_epochs = 25\n", "\n", "s = 20 * len(X_train) // 32 # number of steps in 20 epochs (batch size = 32)\n", "exp_decay = ExponentialDecay(s)\n", "history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n", " validation_data=(X_valid_scaled, y_valid),\n", " callbacks=[exp_decay])" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [], "source": [ "n_steps = n_epochs * len(X_train) // 32\n", "steps = np.arange(n_steps)\n", "lrs = lr0 * 0.1**(steps / s)" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZsAAAEeCAYAAABc5biTAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzs3Xd8FHX6wPHPk94TktA70ntTVEDAhnp6Yj/xVEQP6/kTRU89PbseKtZTxHJiPcWCKOqhHgQEFQGRXqSH0BNISEgIhOf3x0xwWTbJJmSzKc/79ZpXduf7nZlnJrvz7Mx85zuiqhhjjDGBFBLsAIwxxtR+lmyMMcYEnCUbY4wxAWfJxhhjTMBZsjHGGBNwlmyMMcYEnCUbU+OJyAgRyS3nNGki8q9AxeQuY4OIjAnAfC8WkXLds+C9jSqyzY6FiDwgIv+uquX5WL6KyMVBWG6Z21lEbhaRL6oqpmCxZFODichE90vkPfwU7NgCpYSdxodAmwAs6zoRWSgiuSKSLSKLReTRyl5OkARkm/kiIg2AO4Aave1E5EERWRqAWb8O9BGRgQGYd7URFuwAzDH7DrjSa1xhMAIJFlXNB/Irc54iMhJ4ARgN/A8IB7oCJ1XmcoIlENusFNcBP6vqukAvSETCVfVAoJdTmVR1v4i8D9wKfB/seALFjmxqvv2qus1ryAIQkUEickBEBhdXFpHrRSRHRNq479NE5BUReV5EdrvDUyIS4jFNPRF5yy3LF5HvRKSLR/kI99f/aSKyVETyRGSGiLT2DFREzhORBSJSICLrReQxEYnwKN8gIveJyAQ3xs0icqdnufvyI/cIZ4Pn8j3qHSciU0RkmxvLLyJybjm36x+BT1V1gqquUdUVqvqRqt7utU7niMhcd7tkisgXIhLlUSWqpPVxp08UkVdFZIeI7BWRmSLS16vOVSKyUUT2ichUoKFX+VG/uMs6feNjmz3o/u/+JCJr3Vg+E5FUjzphIvKsx+fkWREZLyJpZWzL4cARp4n8/NxFiMhYd7vtE5F5IjLUo3yw+zk4R0R+FpFCYCglayQiX7rz2igif/aK6Z8issr9X24QkSeL/5ciMgJ4AOgiv59BGOGWJbrbYav72V4hIpd5zbvU7wbwOfBHEYkpY1vWXKpqQw0dgInA1DLqPA6kA/WAjkAecLVHeRqwF3jRLb8UyAZu96gzBVgJnAJ0w/lipAPRbvkI4ADOUdYJQHdgITDNYx5DgRzgGuA4YAiwCnjao84GIBO4BWgL/BVQ4CS3vL77/jqgEVDfY/m5HvPpAdzgxtoW+DvO0V5Hr/X+Vynb7RVgNdCmlDpnAQdxTg91dtd7DBDj5/oIMBv40t1ubYFH3O3U2K3TDzjkrkN74Hp3nuoRx4PAUq/YvLdJWe8fBHKBye56nARsBCZ41Lkb2A1cBHQAnnc/K2mlbKNkN/7+XuPTKPtz9x7wE87nro27HQuBHm75YHd7LgHOdOvULyEOdbfb9e52/LsbV1+POvcD/YFWwDnAJuARtywaeBrne9DIHaLd/+EcYLn7eWgDnA1c4O93w60XAxQBpwV7vxKoIegB2HAM/zwn2Rx0dxKew1iPOuHAPOBT4BfgQ695pOHsVMVj3H3AZvd1O/eLeopHeaK7Y7jOfT/CrdPBo84VwP7i+QKzgPu9lj3Mjbe4zgbgP151fgPu83ivwMVedUbgseMsYVv95DWfNEpPNo2BH93l/Qa8C1wFhHvUmQN8UMo8Sl0f4FR3/aO96vwK3OW+fh/41qv8dQKTbAqARI9xfwfWeLzfCtzt8V5wfjCklbINerrbsHU5P3fH4SSDFl7TfQa87L4e7M77Ij++Kwq85jXuO+DdUqa5wWv9fW3nM9w4O5UwjxGU8d3wGJ8FXFvWutTUwU6j1XyzcL7QnsNTxYXqnL8eDpwLNMD5ZeftJ3U/7a4fgaYikgB0wvky/egxz2ycX5OdPabZr6qrPN5vASJwjqgA+gB/d0+35bqncN4HYnF+JRZb7BXbFjduv4lIrHsKZLl7eiYX6Au08HceqrpVVU/COTp6DmfHOgH42eNURy+c6zmlKW19+uD8ot3ptV264uxswdn+P3rNw/t9Zdno/m+PilVEEnH+Tz8XF7qfmZ8pXbT7t8BHWWmfu94423y517b5A79vm2Lzy4jBc/7e7w9/hsVp5TfbPf2aCzxL2Z+ZXsBWVV1RSp2yvhvF8vl9e9U61kCg5tunqmvKqHMizvW5JJxTUXsqadmeO4qDJZSFePx9CPjIx3x2erz2vrirlP/a4tM4pzTG4BxJ7APexvmCl4uqLgWWAi+JyACcC7iX4hxV+qO09QkBtgO+WiHllCPMQzg7Zk/h5Zi+WGVse2+73L/1cI6M/BXiLv94H3F5N2zIq1hovxORE4EPcD6jo3G+I3/E+Swdq7K+G8WSOfK7UKvYkU0t516I/BdwM/At8K6IeP/I6CcinjurE4EtqpoDrMD5nBxuheX+8uyGc57aX7/gXDNZ42Pw/jKW5gAQWkadAcDbqvqJqi4GNnP0r+GKKF7fOPfvQuC0Y5jfLzgX+w/52CY73DorcP4fnrzf7wQaev0Pex5DXEdxj3i24ez8AXCXd3yJEznW4iTOzj7KSvvcLcRJoI18bJuMCq6Gr+1YfETSH8hQ1UdUdZ6q/ga09KpfyNGfvYVAYxHpVMGYAKdRCxCF85molezIpuaLFJFGXuOKVHWniIQC7wAzVXWCiHyMc/rrAZyLocWaAM+JyMs4SeRO3HsiVPU3EZkCTBCRUTi/+B7D2YG8X444HwamishGYBLOr72uwAmqelc55rMBOE1EZuKcntjto85q4AI37gM46xvlo16JRGQ8zumO6TjJqjHONYV9wDdutceAL0RkDc62EJwL1RNUdZ8fi/kO57rPFBG5i98vPp8FfKeq3+M0v/5BRO4BPsa5TnGB13zScH4V3ysiH7h1AnED4/PAXSKyGifxXo+zXUo8YlHVQyLyHc4PgI+9ikv73K0WkfeAiSJyB85OOBln3dap6qcViP9CEZmHs70uxvmh0M8tW41zCu8KnNNrQ4HLvabfALQUkd44jQf24pxGnQt8IiKj3fm0BWJV9bNyxDbQXa/fKrBeNYId2dR8p+N82T2HhW7ZvTgf/GsBVDUTuBq42z0lVOw9nF9sc4HXgDdwzlcXuwbn3Pzn7t8Y4Cx17tXwi6pOwznfPsSdx884rZs2+b+qgHNz4BCc1nALS6hzO7AD55TX1ziNA8p7/8K3ODuiSTg7kMnu+DNUdTWAqn6Fs+M/241lphvbIX8W4F6vOAcnob2Gc7F9Ek5Lry1unZ9w/n834lz/uRDnQrXnfFa45aPcOmfgtEKsbE/j/Hh5E2ebgrNdfF2P8fQqcJn748eTP5+7N4EncRLxVJyWaRsrGP+DOC3pFuNsr2tUdR6Aqn6Bc63zOX7fhv/wmv4T4CucBLMTuFxVD+H8/+fgNCJZgZOUy3vK9nKcbVBrFbcCMnWUe4/EUlW9JdixmJpHRBYCs1X1r2XU+xGnFdk77vs07HMHgIh0xUlg7b0aaNQqdhrNGOMXEWmJc3ppJk4DhL/g3DfyFz8mvx6n5ZY5WhPgqtqcaMCSjTHGf4dw7jV6CucU/HLgbFUts+mx21DDuxm4AVT1m7Jr1Xx2Gs0YY0zAWQMBY4wxAWen0VxJSUnatm3bYIfhU15eHrGxscEOwyeLrWIstoqx2ComkLEtWLBgl6rWL7NisPvLqS5D+/bttbqaMWNGsEMokcVWMRZbxVhsFRPI2ID5an2jGWOMqQ4s2RhjjAk4SzbGGGMCzpKNMcaYgLNkY4wxJuAs2RhjjAk4SzbGGGMCzpKNMcaYgLNkY4wxJuAs2RhjjAk4SzbGGGMCzpKNMcaYgLNkY4wxJuAs2RhjjAk4SzbGGGMCrkqTjYgki8hkEckTkY0iMryEeiIiY0Uk0x3Gioh4lL8qIqtE5JCIjPAx/WgR2SYiOSLybxGJDOBqGWOMKUNVH9m8BBQCDYErgPEi0sVHvVHAMKAH0B04D7jeo3wRcBPwi/eEIjIUuBs4DWgJtAEeKiuwQ1qe1TDGGFMeVZZsRCQWuAi4X1VzVXU28DlwpY/qVwPjVHWzqmYA44ARxYWq+pKq/g8oKGHaN1R1maruBh7xnLYkm3MPsWdfYTnXyhhjjD/EeapnFSxIpBcwR1VjPMaNAQap6nledbOBM1V1rvu+LzBDVeO96s0GXlfViR7jFgGPq+qH7vtUYCeQqqqZXtOPwjmKIqJR2z7D7/sXV3epfmfccnNziYuLC3YYPllsFWOxVYzFVjGBjG3IkCELVLVvWfXCArJ03+KAHK9x2UB8CXWzverFiYho2dnR17S4yzki2ajqq8CrAJGN22na5oOMPr8f3ZsllbGIqpWWlsbgwYODHYZPFlvFWGwVY7FVTHWIrSqv2eQCCV7jEoC9ftRNAHL9SDQlTUsJy/m9UoSgCvd/tpRDdgHHGGMqVVUmm9VAmIi08xjXA1jmo+4yt6yser74mna79yk0b0mRQqOEKBZtzuaDeel+LsoYY4w/qizZqGoe8CnwsIjEikh/4HzgHR/V3wZuF5GmItIEuAOYWFwoIhEiEgUIEC4iUSIS4jHttSLSWUSSgPs8py1JiMB953YC4MlpK8nKs8YCxhhTWaq66fNNQDSwA/gPcKOqLhORgSKS61FvAvAFsARYCnzpjiv2DZAPnIxzzSUfOAVAVf8LPAnMADYBG4EH/AnuD90aM6BtKnv2HeCpaSsrvJLGGGOOVKXJRlWzVHWYqsaqagtVfd8d/72qxnnUU1W9S1WT3eEuz+s1qjpYVcVrSPMof0ZVG6pqgqpeo6r7/YlPRHjwj10IDxU+mJfOwk27K3HtjTGm7rLuary0bRDHdQPbOI0FpiylyBoLGGPMMbNk48NfT21Lk8Qolmbk8P7cjcEOxxhjajxLNj7ERITxj/M6A/Dkf1exPcdXRwXGGGP8ZcmmBEO7NOK0jg3Yu/8gD33hb6trY4wxvliyKYGI8PCwrsREhPLVkm18t3x7sEMyxpgay5JNKZomRTPmzA4A/GPKUnL3HwxyRMYYUzNZsinD1Se3onuzRLZkFzDum1XBDscYY2okSzZlCA0RnriwG6EhwsQfNvBr+p5gh2SMMTWOJRs/dGmSyHUDWqMK93y6hANFh4IdkjHG1CiWbPz0f6e3o3lyNCu25vDG7PXBDscYY2oUSzZ+iokI49Fh3QB47rvVbMzMC3JExhhTc1iyKYdB7eszrGcTCg4c4q6PF9tzb4wxxk+WbMrpH+d1ITUugrnrs3jXurIxxhi/WLIpp+TYCB4d1hWAf369kvSsfUGOyBhjqj9LNhVwVtfGnNu9MfsKi+x0mjHG+MGSTQU99McupMRG8OO6TN7/eVOwwzHGmGrNkk0FpcRF8vD5zum0J75awebddjrNGGNKYsnmGPyhe2PO6daIvMIi7v5kCR4PEzXGGOPBks0xevj8rtSLCWf2ml18MC892OEYY0y1ZMnmGKXGRfKQezrt0anLrXWaMcb4YMmmEpzncTrt9km/UmSt04wx5giWbCqBiPDYsG40iI9k3obdvDprXbBDMsaYasWSTSWpFxvBU5f0AOCZb1exbEt2kCMyxpjqw5JNJRrUvj5XndSSA0XK6A9/peBAUbBDMsaYasGSTSW75+xOtEmNZfX2XJ6aZk/2NMYYsGRT6aIjQnn2sp6EhghvzF7PnDW7gh2SMcYEnSWbAOjRPIlbT20HwJiPFpGdfyDIERljTHBZsgmQm4ccR4/mSWzNLuDeyda7gDGmbrNkEyBhoSE8f1lPYiNC+XLxVibNt94FjDF1lyWbAGqVGsujFzi9Czzw+TLW7Ngb5IiMMSY4qjTZiEiyiEwWkTwR2Sgiw0uoJyIyVkQy3WGsiIhHeU8RWSAi+9y/PT3KIkXkFRHZLiJZIvKFiDStivXz5YJezbiwV1MKDhzilvcXWnNoY0ydVNVHNi8BhUBD4ApgvIh08VFvFDAM6AF0B84DrgcQkQhgCvAuUA94C5jijgf4P+Akd7omwG7gxQCtj18eHtaVVikxrNy2l8e/WhHMUIwxJiiqLNmISCxwEXC/quaq6mzgc+BKH9WvBsap6mZVzQDGASPcssFAGPCcqu5X1RcAAU51y1sD01R1u6oWAB8CvhJalYmLDOPFy3sTHiq8/eNGpi3bFsxwjDGmyklVtZISkV7AHFWN8Rg3Bhikqud51c0GzlTVue77vsAMVY0XkdFu2dke9ae65ePcus8DlwB7gNeBHap6m4+YRuEcRVG/fv0+kyZNqtyV9jJtwwH+s7KQ2HB4+ORoUqL9y/W5ubnExcUFNLaKstgqxmKrGIutYgIZ25AhQxaoat+y6oUFZOm+xQE5XuOygfgS6mZ71Ytzr9t4l3nP5zcgHcgAioAlwC2+AlLVV4FXATp06KCDBw/2c1UqZpAq2ybOY8aqnXy4MZr3/9KPsNCyE05aWhqBjq2iLLaKsdgqxmKrmOoQm9+n0USkoYiMEZHxIpLqjusvIq39nEUukOA1LgHw1UTLu24CkKvOYVhZ83kJiARSgFjgU+BrP2MMKBHh6Ut60CA+kp83ZPHMt6uDHZIxxlQJv5KNiPQBVuFc1L+W33f2ZwCP+bms1UCYiLTzGNcDWOaj7jK3zFe9ZUB3z9ZpOI0Bist7AhNVNUtV9+M0DjihOEEGW0pcJC9c3osQgZfT1vLd8u3BDskYYwLO3yObp4HnVbUXsN9j/DSgvz8zUNU8nKOMh0UkVkT6A+cD7/io/jZwu4g0FZEmwB3ARLcsDef02K1uM+fiU2TT3b/zgKtEJFFEwoGbgC2qWm06KTuxTQp3Du0IwO2TfrWnexpjaj1/k00fnCbG3rbiNGP2101ANLAD+A9wo6ouE5GBIpLrUW8C8AXO9ZalwJfuOFS1EKdZ9FU4DQBGAsPc8QBjgAKcazc7gXOAC8oRY5W4/pQ2nN6pITkFB7nxvQV2/40xplbzt4FAPs49Ld464iQOv6hqFk6i8B7/Pc6F/+L3CtzlDr7msxAnAfoqy8Q53VethYQI4y7pwbn/+p6lGTk8PHU5j1/QLdhhGWNMQPh7ZDMFeEBEIt33KiKtgLHAJwGIq05IjAln/BV9iAgL4f25m/j0l83BDskYYwLC32QzBkjGOS0VA8wG1uCcxrovMKHVDV2bJvLQH517Tu+dvIRV26z/NGNM7eNXslHVHFUdgHMK7G84N02epaqD3Av/5hj86fjmXNjb6T/thncX2PNvjDG1jr9Nn68SkUhVna6qT6vqk6r6nYhEiMhVgQ6ythMRHhvWjY6N4lm/K4/bPlhI0SF7/o0xpvbw9zTam0Cij/Hxbpk5RtERobx2VV+SYsKZsWonz9oNn8aYWsTfZCOAr5/aLTi66xhTQc2TY3hpeG9CBP41Yw1fLdka7JCMMaZSlNr0WUSW4CQZBWaKyEGP4lCgJfBV4MKre/q3TeXeczrx6JcruGPSIlqnxgY7JGOMOWZl3Wfzsfu3K86NlZ43XhYCG7Cmz5Xu2gGtWbYlh8kLMxj1znz+1lPKnsgYY6qxUpONqj4EICIbgA/d58OYABMRnriwG2t25LIkI5vxi0I467RDfvUQbYwx1ZG/TZ/fskRTtaLCQ5lwZR9SYiNYlnmIx79aGeyQjDGmwvxt+hwhIg+JyGoRKRCRIs8h0EHWVU2Sohn/5z6ECvx7znrem7sx2CEZY0yF+Hte5hHcRzUDh4A7cZ4bk4nTuaYJkBNaJzOiSwQA/5iyjNm/VZvOq40xxm/+JptLgRtUdQJO9/5TVPVW4AGcZ9qYABrYLJwbBh1H0SHlxvcWsGaHdWljjKlZ/E02DYHl7utcIMl9/V/gzMoOyhztrqEdOKtLI/YWHGTkxPlk5RWWPZExxlQT/iabTUAT9/UaYKj7+iScxw+YAAsJEZ65rAfdmiayKWsf178zn/0H7XKZMaZm8DfZTAZOc18/DzwkIutxnp75egDiMj7ERITx+tV9aZQQxbwNu7n7kyU4j/4xxpjqza+Hp6nqPR6vPxaRdJzHQa9W1amBCs4crWFCFK9f3ZdLXvmRyQszaJkSw22ntw92WMYYU6oK3SWoqnNV9RlVnSoi1p9KFevaNJEXLu9FiMBz3/3Gh/M2BTskY4wpVYVvSReRKBG5E1hfifEYP53RuSEPn98VgHsnL2X6yu1BjsgYY0pWarJxb+Z8TETmicgPIjLMHX8VsA64DXi2CuI0Pvz5xJbcMqQtRYeUm99byK/pe4IdkjHG+FTWkc2DwC3ARqA18JGIvAz8HbgHaKWqTwQ0QlOqO85sz8V9mpF/oIiRE+exYZc9ONUYU/2UlWwuBUao6sXAWTiPFagHdHH7S7PnFwdZcaedg9rXJyuvkKv+/TM79+4PdljGGHOEspJNc2AegKouwnmswFhVPVjqVKZKhYeG8PIVvQ/fgzNy4jzy9tu/yBhTfZSVbMIBz5/JB7Anc1ZLsZFh/HvE8bRIjmFJRjaj3plPwQG76dMYUz34c5/NEyKyz30dATwoIkckHLefNBNk9eMjeXvkCVz8yo/MWZPJrf9ZyMtX9Lbn4Bhjgq6svdAs4Digmzv8ALTweN8N5ymepppolRrLu9edQGJ0ON8s385dHy/m0CHrZcAYE1xlPalzcBXFYSpRx0YJvHnN8fz59bl8ujCD+KgwHvxjF0Ts8dLGmOCw8yu1VO8W9Xjtqr5EhIbw1o8beebb1cEOyRhTh1myqcX6t03lxeG9CA0RXpy+hldnrQ12SMaYOqpKk42IJIvIZBHJE5GNIjK8hHoiImNFJNMdxorHOSAR6SkiC0Rkn/u3p9f0vUVklojkish2Efm/QK9bdTW0SyOeurg7AI9/tZJ3frJHSxtjql5VH9m8hHOvTkPgCmC8iHTxUW8UMAzoAXQHzgOuB6cLHWAK8C7ODaZvAVPc8YhIKs5D3SYAKUBb4JvArVL1d2HvZjx8vrOZ7/9sKe/PtY47jTFVq8qSjds79EXA/aqaq6qzgc+BK31UvxoYp6qbVTUDGAeMcMsG4zRseE5V96vqC4AAp7rltwPTVPU9t3yvqq4I2IrVEFed1Ir7z+0MwL2TlzBpXnqQIzLG1CXiz8O3RKRFCUUKFKjqTj/m0QuYo6oxHuPGAINU9TyvutnAmao6133fF5ihqvEiMtotO9uj/lS3fJyITAeWAMfjHNXMBW5W1aN+zovIKJyjKOrXr99n0qRJZa1GUOTm5hIXF1cp8/p6/QE+XFWIANd2i2BA0/BqE1tls9gqxmKrmLoa25AhQxaoat8yK6pqmQNwCCgqZdgNPAOElTKPgcA2r3F/AdJ81C0COnq8b4eT2AS4H/jAq/57wIPu69XAHpxkEwW8gJPkSl3H9u3ba3U1Y8aMSp3f+LQ12vJvU7XV3VP1kwXpxzSvyo6tMllsFWOxVUxdjQ2Yr37kEb+e1AlcDjwJvIJzpADQD+eo4EEgCbgP2As8UMI8coEEr3EJ7jRl1U0AclVVRaSs+eQDk1V1HoCIPATsEpFEVbWudoAbBh1H0SHlqWmrGPPRIkJDhPN7Ng12WMaYWszfZHMjMFpVP/UYN11EVgH/p6qDRGQH8BAlJ5vVQJiItFPV39xxPYBlPuouc8t+9lFvGXCHiIibVcFpRPCS+3oxzlFQMbt93oeb3efgPPPtakZ/+CuqMKyXJRxjTGD420CgH851EG9LcU5XAfwINCtpBqqaB3wKPCwisSLSHzgfeMdH9beB20WkqYg0Ae4AJrplaTin2W4VkUgRucUdP939+yZwgds8OhzntNtsO6o52q2nteO209txSGH0pF/t8dLGmIDxN9lsxL2Q7uUvQPEeqj6QVcZ8bgKigR3Af4AbVXWZiAx0T48VmwB8gZPglgJfuuNQ1UKcZtFX4VybGQkMc8ejqtOBe91pduA0EvB5P4+B205vz51DO6AKf/tkCRPn2FO+jTGVz9/TaHcAn4jIObjPtwH64nTSeZH7/nig1OZcqpqFkyi8x38PxHm8V+Aud/A1n4VAn1KWMx4YX1os5nc3D2lLdHgoD09dzoNfLKfg4CFuGHRcsMMyxtQifiUbVf1SRNrhHJl0cEd/DryibpNiVX05MCGaqjByQGuiwkP5+2dL+OfXK8kvLOK209tZ553GmErh75ENqpoO3BPAWEyQDe/XgqjwEMZ8tIjn//cbBQeKuPvsjpZwjDHHzO9kIyIxQE+gAV7XerxaqZka7MLezYgMC+X/PljIhFnryN1/kIfP70poiCUcY0zF+ZVsROR0nAv6KT6KFQitzKBMcP2he2Miw0K46f1feG/uJnbvK+TZy3oSGWb/ZmNMxfjbGu15nNZdzVQ1xGuwPVAtdHrnhrwz8gTiI8P4ask2rnlzHnsLDgQ7LGNMDeVvsmkFPKKqWwIYi6lm+rVJ4cPrT6J+fCQ/rM3k8td+Yufe/cEOyxhTA/mbbObweys0U4d0bpLAJzecTKuUGJZm5HDJKz+QnrUv2GEZY2oYf5PNK8DTInKdiPRzH052eAhkgCb4WqTE8NENJ9OlSQIbMvdx4fgfWL4lJ9hhGWNqEH+TzcdAR+BVnG5p5nsM80qZztQS9eMj+WDUiZzUJoWde/dz6YQfmbm6zCdLGGMM4H+yaV3K0CYwoZnqJj4qnDevOZ5zuzcmd/9BRk6cR1q6NRowxpTN3x4E7MH1BoCo8FBe+FMvWiTH8HLaWiYuKyTq65XcNbQDIXYvjjGmBCUmGxG5EPhCVQ+4r0tkN3XWLSEhwl1ndaRFcgz3Tl7CKzPXkp61j3GX9iAq3FrCG2OOVtqRzcdAI5yekz8upZ7d1FlH/emEFuza9BsTlhzkyyVb2Zqdz2tX9SUlLjLYoRljqpkSr9m4N2zu8Hhd0mCJpg7rmhrKRzeeRJPEKH7ZtIcLXv6B1dt9PXzVGFOX+dtAwJgSdWyUwOSb+9O1aQKbsvZxwUtz+Hb59mCHZYypRsrTEWcz4BR8d8T5TCXHZWqYhglRfHT9ydz58SKmLt7KqHfmM+bMDtw0+DjrNdoY43dHnFcA/wYOAjtxrtMUU8CSjSE6IpQXL+9Fp8YxhMz6AAAgAElEQVQJPP3NKp6atooVW3N46uIeREfY2VZj6jJ/T6M9DIwDElS1laq29hjsPhtzmIhw85C2vHZlX2IjQpm6eCuXTPiBLXvygx2aMSaI/E02DYHXVbUokMGY2uP0zg2ZfHN/Wrp9qv3xX7OZuy4z2GEZY4LE32TzFdAvkIGY2qd9w3im3Nyf/m1T2JVbyPDX5/LarHWoatkTG2NqFX8bCHwLjBWRLsAS4Ig+SuymTlOSpJgI3rrmBJ7+ZjWvzFzLY1+tYMHG3Tx1SXfio8KDHZ4xpor4m2wmuH/v9VFmN3WaUoWFhnD32R3p1SKJMZMW8d9l21i1fS+v/LkPHRrFBzs8Y0wV8Os0mt3UaSrD0C6N+PyvA+jYKJ71u/IY9tIcPluYEeywjDFVoMxkIyLhIjJXROzhaeaYtU6NZfJN/bmwd1PyDxRx24e/ct9nSyg4YG1PjKnNykw2qnoA51ECdlXXVIroiFDGXdKDxy/oRkRoCO/+tIlhL81hzQ7r5saY2srf1mhvAX8JZCCmbhERhvdrwac3nUzr1FhWbtvLeS/OYdK8dGutZkwt5G8DgVjgChE5A1gA5HkWquqtlR2YqRu6Nk3ki78O4B+fLeXThRnc9clivl+zi8cu6EqCtVYzptbw98imE/ALsBvnyZzdPIaugQnN1BVxkWE8c1lPnrm0BzERoXyxaAt/eOF7fk3fE+zQjDGVxN8ndQ4JdCDGXNi7Gb1a1OOv//mFpRk5XDz+B0af0Z4bBh1HqD0F1JgazR4xYKqV1qmxfHLjyYzs35qDh5Snpq3i0gk/sjEzr+yJjTHVlt/JRkSGiMirIvJfEZnuOZRjHskiMllE8kRko4gML6GeiMhYEcl0h7Hi0U+9iPQUkQUiss/929PHPCJEZIWIbPY3PlM9RIaF8o/zOvP2yBNomBDJgo27Ofv573l/7iZrPGBMDeVXshGREcDXQDwwGOcxA/WA3sDycizvJaAQp2PPK4Dxbhc43kYBw4AeQHfgPOB6N5YIYArwrhvDW8AUd7ynO904TQ11Svv6TLvtFM7r0YR9hUXcO3kJIyfOY0dOQbBDM8aUk79HNmOAW1T1cpx+0e5R1V44O/xcf2YgIrHARcD9qpqrqrOBz4ErfVS/GhinqptVNQPn8QYj3LLBONeanlPV/ar6AiDAqR7Lag38GXjCz/Uz1VRSTAQvXt6LFy7vRWJ0ODNW7WToc7P4asnWYIdmjCkH8ee0hIjsAzqr6gYR2QWcqqqLRaQjkKaqjfyYRy9gjqrGeIwbAwxS1fO86mYDZ6rqXPd9X2CGqsaLyGi37GyP+lPd8nEe79/AaT33rqo2KyGmUThHUdSvX7/PpEmTytwWwZCbm0tcXFyww/CpKmPbXXCIN5YUsjTT6W3g+Eah/LlTJImRvhsP2HarGIutYupqbEOGDFmgqn3LqufvfTaZOKfQADJwmjsvBlKAaD/nEQfkeI3L9pivd91sr3px7nUb77Ij5iMiFwChqjpZRAaXFpCqvgq8CtChQwcdPLjU6kGTlpaGxeYYNlR596eNPPH1SuZtK+K3nAM8cF5nhvVsetTjp227VYzFVjEWW+n8PY32PXCm+3oS8IKIvAn8B+fxA/7IBRK8xiUAvvoo8a6bAOSqcxhW4nzcU3VPAnaTaS0lIlx5Uium3XYKA9ulsmffAUZ/uIiRE+fZ00CNqcb8TTa34CQWcK6DPIVzVDMJuM7PeawGwkSknce4HsAyH3WXuWW+6i0DusuRP2O7u+PbAa2A70VkG/Ap0FhEtolIKz/jNDVA8+QY3h55Ak9e3J2EqDBmrNrJmc/O4r25Gzl0yFqsGVPd+HtTZ5bH60PA2PIuSFXzRORT4GERuQ7oCZwPnOyj+tvA7SLyFU4HoHcAL7plaUARcKuIvMLvfbZNBw4BzT3mczLwL5xWc9YyrZYRES7t25zB7etz32dL+Wb5dv4+eSmf/7qFxy7oFuzwjDEeynOfTUMRGSMi40Uk1R3X32355a+bcK7x7MA5UrpRVZeJyEAR8WzVNgH4AuepoEuBL91xqGohTrPoq4A9wEhgmKoWqupBVd1WPABZwCH3vfVhX0s1SIhiwpV9eGl4b1LjIpi7Pouzn5/FJ6sL7dEFxlQT/t5n0wdYhXNvzLX8fs3kDOAxfxemqlmqOkxVY1W1haq+747/XlXjPOqpqt6lqsnucJd6NJtT1YWq2kdVo1W1t6ouLGF5aSW1RDO1i4jwh+6N+Xb0IC4/oTkHipQv1h3gjGdnMmPljmCHZ0yd5++RzdPA8+69Nfs9xk8D+ld6VMZUUL3YCJ64sDuf3HgSzeNDSM/K55qJ87jhnQXWgMCYIPI32fTBuVPf21ac3gCMqVb6tEzmwZOiuO8PnYiJCOW/y7Zx+jMzeW3WOgoPHgp2eMbUOf4mm3ycrmG8dcS5/mJMtRMaIlw3sA3/u2MQZ3dtxL7CIh77agVnPT+LGavsY2tMVfI32UwBHhCRSPe9uk2JxwKfBCAuYypN48Roxv+5D29eczxtUmNZtzOPa96cxzVv/szanX71tmSMOUbl6RstGaf5cAwwG1iDc+f+fYEJzZjKNaRDA/572ync94dOxEc69+YMfXYWj0xdTnb+gWCHZ0yt5leyUdUcVR2A0+T4b8DzwFmqeoqq2oNGTI0RERbCdQPbMOPOwVx+QnOKVHlj9nqGPJ3Ge3M3crDIrucYEwjleniaqk5X1adV9UlV/U5EWopI9ey90phSpMZF8sSF3fnilgGc0DqZrLxC/j55KWc//z3fLd9uz80xppId65M6k3AeG2BMjdS1aSIfjjqRfw3vRfPkaH7bkct1b8/nsgk/sWDj7mCHZ0ytYY+FNnWeiHBu9yZ8d/sg/nFuZ+rFhPPzhiwuGv8DN7yzwBoRGFMJLNkY44oMC2XkgNbMvGsINw85jqjwEP67bBtnPjuLeycvsSeEGnMMLNkY4yUhKpw7h3YkbcwQ/nR8c1SV9+duYuCTM3h06nJ25e4veybGmCOU2uuziHxexvTez5UxptZolBjFPy/qznUDW/PUtFVMW7ad12ev5725m7j65FZcf0ob6sVGBDtMY2qEsh4xkOlH+fpKisWYaqltg3gmXNmXpRnZPPvtav63cgevzFzLOz9uYOSA1lw3oA2JMeHBDtOYaq3UZKOq11RVIMZUd12bJvLGiOP5NX0Pz367mpmrd/Li9DVM/GED1w5ozTUnt7akY0wJ7JqNMeXUs3kSb408gU9uPIkBbVPZW3CQ5777jf5jp/PE1yvYsdcaEhjjzZKNMRXUp2Uy717Xjw9HncjAdqnk7j/IhJnrGDB2Bvd/tpT0rH3BDtGYasOvx0IbY0rWr00K/dqksCh9Dy+nrWHasu2889NG3v95E+f3bMKNg46jXcP4YIdpTFDZkY0xlaRH8yQmXNmXb0afwoW9mgLw6S8ZnPHsLEa9PZ95G7KsGxxTZ9mRjTGVrH3DeJ65rCejz2jPq7PW8eH8dL5Zvp1vlm+nR7NErh3YhphDlnRM3WJHNsYESPPkGB4Z1pXZfxvCrae2pV5MOIs2Z3PrfxZy16x8Xp211h5tYOoMSzbGBFiD+ChuP7MDP9x9Go9f0I029WPJKlAe/2olJz/xPx76YhmbMq0xgandLNkYU0WiI0IZ3q8F340exOg+kfRvm0JeYRFvztnAoKdnMHLiPGas3EGRnWIztZBdszGmioWECD3qh/F/l5zI8i05vDF7PV8s3sL0lTuYvnIHzZOj+XO/llzStznJ1h2OqSXsyMaYIOrcJIFxl/bgp3tO4+6zO9KsXjTpWfk88fVKTnzif9w+6Vd+Td9jrdhMjWdHNsZUA8mxEdww6Dj+MrANM1fv4J0fN5K2eief/pLBp79k0LVpApcd34I/9mhCYrR1iWNqHks2xlQjoSHCqR0bcmrHhmzMzOP9uZv4cH46SzNyWJqxlEenLuecbo25tG9zTmyTjIgEO2Rj/GLJxphqqmVKLPec04nRZ7Rn2rJtfDgvnR/WZjJ5YQaTF2bQMiWGS/s256LezWiUGBXscI0plSUbY6q5qPBQzu/ZlPN7NmVT5j4+WpDOR/M3szFzH09NW8W4b1YxuEMDLurdjNM6NSAqPDTYIRtzFEs2xtQgLVJiuOPMDtx2entm/baTSfPS+W7F9sMt2eIjwzinW2OG9WpKv9bJhITYaTZTPVRpazQRSRaRySKSJyIbRWR4CfVERMaKSKY7jBWPk9Mi0lNEFojIPvdvT4+yO0VkqYjsFZH1InJnVaybMVUpNEQY0qEB4//ch5/uOY37z+1Mt6aJ7N1/kA/np3P5az/Rf+x0/vn1SlZt2xvscI2p8iObl4BCoCHQE/hSRBap6jKveqOAYUAPQIFvcZ4I+oqIRABTgOeAl4HrgSki0k5VCwEBrgIWA8cB34hIuqp+EPC1MyYIUuIiuXZAa64d0Jo1O/by2cItTF6YQcaefF6ZuZZXZq6lU+MEhvVswh+6N6ZZvZhgh2zqoCo7shGRWOAi4H5VzVXV2cDnwJU+ql8NjFPVzaqaAYwDRrhlg3GS5HOqul9VX8BJMKcCqOqTqvqLqh5U1VU4ial/AFfNmGqjbYN4xgztwPd3DeGjG05ieL8WJEaHs2JrDk98vZIBY2dw/ktzeHXWWjbvti5yTNWRqrpZTER6AXNUNcZj3BhgkKqe51U3GzhTVee67/sCM1Q1XkRGu2Vne9Sf6paP85qPAL8AE1T1FR8xjcI5iqJ+/fp9Jk2aVElrW7lyc3OJi4sLdhg+WWwVU5WxHTikLN5ZxNytB/l1ZxGFRb+XtUkM4fhGYfRtGEr9mJAqj628LLaKCWRsQ4YMWaCqfcuqV5Wn0eKAHK9x2YCvp0rFuWWe9eLc5OFdVtp8HsQ5envTV0Cq+irwKkCHDh108ODBpa5AsKSlpWGxlZ/F9rsz3L/5hUWkrdrB1CVbmb5iB+uyi1iXXciHq6BHs0TO6daYRNnIubbdys1iK11VJptcIMFrXALg6+qld90EIFdVVUT8mo+I3IJz7Wagqu4/lsCNqS2iI0I5u1tjzu7W+KjEs2hzNos2O7/jXl81kzM6N+T0Tg3p1TzJWrWZY1aVyWY1EOZeyP/NHdcD8G4cgDuuB/Czj3rLgDtERPT3c4DdcRofACAiI4G7gVNUdXPlroYxtYOvxPPfZdv4ZukW1uzIZc2OXManrSU1LpLTOzXgjM4N6d821e7jMRVSZclGVfNE5FPgYRG5Dqc12vnAyT6qvw3cLiJf4bRGuwN40S1LA4qAW0XkFeAv7vjpACJyBfA4MERV1wVodYypVTwTz3fT9xDdohvfLt/Ot8u3k7Ennw/mpfPBvHSiw0MZ2C6VUzs2YHCHBtZzgfFbVTd9vgn4N7ADyARuVNVlIjIQ+FpVi69gTQDaAEvc96+741DVQhEZ5o77J7ACGOY2ewZ4FEgB5nncmvOuqt4Q0DUzppYICxH6t02lf9tUHjivMyu27uW7FU7iWZKRffgR1wAdG8UzqEN9BrWvT9+WyUSEWUfyxrcqTTaqmoVz/4z3+O9xLvwXv1fgLnfwNZ+FQJ8SylpXSrDGGESEzk0S6NwkgVtPa8fW7Hy+W7GDmat28sPaXazctpeV2/YyYeY64iLDOPm4FAZ3aMCgDvVpmhQd7PBNNWLd1Rhj/NY4MZorT2zJlSe2ZP/BIuZv2M3M1TtJW7WD1dtzjzjqadsgjgFtUzn5uBT6tUmxRyPUcZZsjDEVEhkWevh0273ndCJjTz4zV+1k5uodzFmTebiRwcQfNhAi0K1pIie7yadvy2SiI6yhQV1iycYYUymaJkUzvF8LhvdrQeHBQ/yavoc5a3bx49pMFqbvPty0enzaWiJCQ+jdMomTj0ulf9sUujVNsus9tZwlG2NMpYsIC+GE1smc0DqZ0WfAvsKD/Lw+ix/WZvLD2l0s25LDT+uy+GldFs98C1HhIfRsnsQJrZI5vnUyvVvUIzbSdk+1if03jTEBFxMRxuAOTnNpgN15hfy0LpMf1mby4zrnlFtx8gGnV+suTRI4vlWyO9QjJS4ymKtgjpElG2NMlasXG3H4vh6AzNz9zN+4m3nrs5i3IYulW3JYvDmbxZuzeWP2egCOqx9L08j9bIvZRK8W9WjbII5Q69mgxrBkY4wJupS4SIZ2acTQLo0AyNt/kIWb9vDzhizmrc9iYfpu1u7MYy0wa7Nz+11sRCg9mifRq0USPZvXo2fzJOrH29FPdWXJxhhT7cRGhjGgXSoD2qUCUHjwEEu3ZPPR9PnsjUhh4aY9ZOzJd68BZR6ernlyND2b16NX8yR6NE+kc+NEa/VWTViyMcZUexFhIfRuUY+cVuEMHtwbgB05BSxM38Ov6XtYuGk3izdnk56VT3pWPl8s2gJAiMBx9ePo1jSRru7QpUmCNT4IAtvixpgaqUFC1BGn3g4WHeK3Hbks3OQknyUZ2fy2I/fw8OnCDABEoE1qLF2bJtKtaSJdmiTSpWkCCVF202kgWbIxxtQKYaEhdGqcQKfGCQzv1wKAggNFrNy2lyUZ2SzLyGZJRjart+91rv/szGPKr1sOT9+sXjQdG8XTsVECHRvH07FRPK1SYgkLtft/KoMlG2NMrRUVHkrP5kn0bJ50eNz+g0Ws3pbL0i3Zh5PQim172bw7n827nb7fikWEhdC+YRwdGibQqbGTiDo0ireGCBVgycYYU6dEhoXSrVki3Zolcrk77mDRITZk5jkdi27dy8ptOazYupeMPfkszchhacaRDxlOiY2gbYO4w0O7BvHsLjiEquLR27zxYMnGGFPnhYWG0LZBPG0bxHNu99/H5xQcYPW2vazYtpeVW3NY5fZynZlXSOb6LOauzzpiPv/48RvaNIijbf042jV0/rZtEEfz5Jg6f0+QJRtjjClBQlQ4fVsl07dV8uFxqkrGnvzDHY2u3en8XZGxm737D7IofQ+L0vccMZ+IsBBap8TSMiWG1qmxtEqNpVVKLK1SY2gYH1UnHrttycYYY8pBRGhWL4Zm9WIOd78DkJaWRre+J/Gbm4Q8E9HW7AJWbd/Lqu17j5pfVHgIrdxE1Co1ltYpvyejhgmRtea0nCUbY4ypJClxkaTERXJim5Qjxu8tOMCGXfvYkJnHhl15rHf/bszcR2Ze4eGH0HmLDg+lWb1omtWLpnlyDM3rxRzxOjGm5jTXtmRjjDEBFh8VfrhRgrfs/ANszMxj/a48Nuza57x2k9HufQcO3yfke75hNKsXQ3M3ATWrF03zejE0T46hcVJUtbp3yJKNMcYEUWJ0ON2bJdG9WdJRZdn5B9i8ex/pWfls3r2PzbvzSc/aR7o7bm/BQVZszWHF1hwfc4a4yDAaJ0YRWVTA17sW0zgpiiaJ0TROiqJxYjRNkqKIiaiaNGDJxhhjqqnE6HASo51eDrypKll5haTvzj+ckNLdhLQ5ax9bsvPJ3X/w8FHR0sx0n8tIiAqjSVI0jROjaJwUTZPEKBokRNEwIYqGCZE0iI+iXkz4MV87smRjjDE1kIgcvkbkedNqMVUlO/8AW/YU8M3sn0lp0Y6te/LZll3Alux8tmYXsDW7gJyCg+SUcM2oWERoCPXjI2mQEEnD+Cjnb0IUDcpxc6slG2OMqYVEhKSYCJJiItjRIIzBJ7Y8qo6qkplX6CSgPU4C2pKdz86c/WzfW8COnP1sz3ESUsaefDL25Fc4Hks2xhhTR4kIqXGRpMZF0rXp0afqihUcKHISz94CtucUHH69I2c/z/m5LEs2xhhjShUVHkqLlBhapMQcVfbcn/ybh3VnaowxJuAs2RhjjAk4SzbGGGMCzpKNMcaYgLNkY4wxJuAs2RhjjAm4Kk02IpIsIpNFJE9ENorI8BLqiYiMFZFMdxgrHn0liEhPEVkgIvvcvz39ndYYY0zVq+ojm5eAQqAhcAUwXkS6+Kg3ChgG9AC6A+cB1wOISAQwBXgXqAe8BUxxx5c6rTHGmOCosmQjIrHARcD9qpqrqrOBz4ErfVS/GhinqptVNQMYB4xwywbj3Iz6nKruV9UXAAFO9WNaY4wxQVCVPQi0Bw6q6mqPcYuAQT7qdnHLPOt18ShbrKrqUb7YHf/fMqY9goiMwjkSAtgvIkv9W5UqlwrsCnYQJbDYKsZiqxiLrWICGdvRna75UJXJJg7wfuhCNhBfQt1sr3px7rUX7zLv+ZQ4rVeCQlVfBV4FEJH5qtrX/9WpOhZbxVhsFWOxVYzFVrqqvGaTCyR4jUsAfPVr7V03Ach1k0VZ8yltWmOMMUFQlclmNRAmIu08xvUAlvmou8wt81VvGdDdq4VZd6/ykqY1xhgTBFWWbFQ1D/gUeFhEYkWkP3A+8I6P6m8Dt4tIUxFpAtwBTHTL0oAi4FYRiRSRW9zx0/2YtjSvln+tqozFVjEWW8VYbBVjsZVCqvLskogkA/8GzgAygbtV9X0RGQh8rapxbj0BxgLXuZO+Dvyt+FSYiPRyx3UGVgDXqupCf6Y1xhhT9ao02RhjjKmbrLsaY4wxAWfJxhhjTMDV+WTjb39tlbi8NBEpEJFcd1jlUTbcjSFPRD5zr3H5FWdp05YSyy0iMl9E9ovIRK+y00Rkpdv/3AwRaelRFiki/xaRHBHZJiK3V9a0ZcUmIq1ERD22X66I3F9Vsbl13nC39V4R+VVEzq4O26202IK93dx674rIVrfeahG5rjLmH8jYqsN286jfTpx9x7se4wKyzyhr2gpR1To9AP8BPsS5GXQAzk2gXQK4vDTgOh/ju+DcK3SKG8v7wAf+xFnWtKXEciFOP3LjgYke41Pd+V8CRAFPAT95lD8BfI/TN10nYBtw1rFO62dsrQAFwkpYp4DGBsQCD7pxhADnutu+VbC3WxmxBXW7eXxOI93XHd16fYK93cqILejbzaP+N279dwO9zyht2grv+wK1U60JA86XsxBo7zHuHeCfAVxmGr6TzePA+x7vj3Njiy8rztKm9TOmRzlyhz4K+MFrO+UDHd33W4AzPcofKf6gHsu0fsZW1pe/ymLzqLcYp9+/arPdfMRWrbYb0AHYClxa3babV2zVYrsBfwIm4fyYKE42AdlnlDVtRYe6fhqtpP7afPalVomeEJFdIjJHRAa7447o001V1+L+w/2Is7RpK8J7fnnAWqCLiNQDGlN633UVnbY8NorIZhF5U0RSAYIRm4g0xNnOy45x/oGOrVhQt5uIvCwi+4CVODv0r45x/oGOrVjQtpuIJAAPA96n2QK1zwjIfrGuJ5vy9NdWWf4GtAGa4txo9YWIHEfpfb6VFWdZ/cWVV1mxwNH9z/kTS1nT+mMXcDxO53993Gnf81h2lcUmIuHust9S1ZXHOP9Ax1Yttpuq3uSWDcS5yXv/Mc4/0LFVh+32CPCGqm72Gh+ofUZA9ot1PdmUp7+2SqGqc1V1rzqPR3gLmAOcU0Ys5e0Pzru8vMqKBY7uf86fWMqatkzqPJ5ivqoeVNXtwC3AmSISX5WxiUgIzqmFQjeGY51/QGOrLtvNjaVInUeMNANuPMb5BzS2YG83cR4MeTrwrI9wA7XPCMh+sa4nm/L01xYoivM8niP6dBORNkCkG2NZcZY2bUV4zy8W55zuMlXdjXOKobS+6yo6bUUU35UcUlWxiYgAb+A8BPAiVT1QCfMPdGzeqny7+RBWPJ9jmH+gY/NW1dttMM51o00isg0YA1wkIr/4mH9l7TMCs188lgs+tWEAPsBpeREL9CeArdGAJGAoTsuUMJynlebhnCPtgnPoOtCN5V2ObB1SYpxlTVtKPGFuLE/g/BIujqu+O/+L3HFjObIVzT+BmTitaDrifGmKW+BUeFo/Y+uHcwE3BEjBaTEzo4pjewX4CYjzGl8dtltJsQV1uwENcC5yxwGhON+DPOCPwd5uZcQW7O0WAzTyGJ4GPnbnHbB9RmnTVnj/F4idak0agGTgM/fDtQkYHsBl1Qfm4RyO7sHZKZzhUT7cjSEP59HXyf7GWdq0pcTzIM4vNc/hQbfsdJwLpfk4LehaeUwXidPHXQ6wHbjda74Vnras2IDLgfXuem7F6Xi1UVXFhnPuXoECnNMNxcMVwd5upcVWDbZbfZwd6x633hLgL5Ux/0DGFuztVsL34t1A7zPKmrYig/WNZowxJuDq+jUbY4wxVcCSjTHGmICzZGOMMSbgLNkYY4wJOEs2xhhjAs6SjTHGmICzZGNMLSMiI0Qkt+yaxlQdSzbGBIiITHQfvFU87BKRqSLSsRzzeFBElgYyTmOqgiUbYwLrO5yu5BsDZwLRwOSgRmRMEFiyMSaw9qvqNnf4Baf33o4iEg0gIv8UkVUiki8iG0TkSRGJcstGAA/gPP+k+OhohFuWKCLjxXmUcYGIrBCRyzwXLM4jiZe6j/adISKtq3LFjfEUFuwAjKkr3G7pLwOWqGq+OzoPGAlkAJ1xOtLcD9yP0+ljV5zHOw9262e7PTt/hdOB4zU4vfR2wOnssVgkcI877wLgLXfeQwOzdsaUzpKNMYF1lsfF+lggHef5RQCo6iMedTeIyOM43cjfr6r57rQHVXVbcSUROQM4CacX3hXu6HVeyw0DblbVVe40TwP/FhFR6xDRBIGdRjMmsGYBPd3hBOB/wDci0hxARC4Wkdkiss1NLM8CLcqYZy9gq0ei8WV/caJxbQEicI6GjKlylmyMCax9qrrGHeYB1+E89XCUiJyI89yQacB5OEnkPiC8EpZ78P/bu1uViKIoDMPvSgY12BTDFItVMFk1eAGGwavwBrwBi17IgDBZsIiYBIvVIgajzTDLsESGU0RxyYT3SefA3gdO+tg/8A3ev0q//uDb0o+5jSb9rwRmVCnWHvA8v5UWEaPB+Heq0GvePbAREdvfrG6khWHYSL2WImL983mN6rBfAabAKrAZEcfALXV4Px7MfwJGEbFDlVi9UVtxd8AkIk6oCwJbwHJmXsVYJPQAAABsSURBVPb+jvQ7LqmlXvtUw+MLFRC7wFFmXmfmFDgDzoEH4AA4HcyfUDfProBXYJyZM+AQuKHqfB+BC+pMRlpINnVKktq5spEktTNsJEntDBtJUjvDRpLUzrCRJLUzbCRJ7QwbSVI7w0aS1O4D96wB2VjPCZQAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(steps, lrs, \"-\", linewidth=2)\n", "plt.axis([0, n_steps - 1, 0, lr0 * 1.1])\n", "plt.xlabel(\"Batch\")\n", "plt.ylabel(\"Learning Rate\")\n", "plt.title(\"Exponential Scheduling (per batch)\", fontsize=14)\n", "plt.grid(True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Piecewise Constant Scheduling" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [], "source": [ "def piecewise_constant_fn(epoch):\n", " if epoch < 5:\n", " return 0.01\n", " elif epoch < 15:\n", " return 0.005\n", " else:\n", " return 0.001" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [], "source": [ "def piecewise_constant(boundaries, values):\n", " boundaries = np.array([0] + boundaries)\n", " values = np.array(values)\n", " def piecewise_constant_fn(epoch):\n", " return values[np.argmax(boundaries > epoch) - 1]\n", " return piecewise_constant_fn\n", "\n", "piecewise_constant_fn = piecewise_constant([5, 15], [0.01, 0.005, 0.001])" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/25\n", "55000/55000 [==============================] - 6s 111us/sample - loss: 0.8151 - accuracy: 0.7655 - val_loss: 0.6868 - val_accuracy: 0.7780\n", "Epoch 2/25\n", "55000/55000 [==============================] - 6s 102us/sample - loss: 0.8153 - accuracy: 0.7659 - val_loss: 1.0604 - val_accuracy: 0.7148\n", "Epoch 3/25\n", "55000/55000 [==============================] - 6s 104us/sample - loss: 0.9138 - accuracy: 0.7218 - val_loss: 1.3223 - val_accuracy: 0.6660\n", "Epoch 4/25\n", "55000/55000 [==============================] - 6s 103us/sample - loss: 0.8506 - accuracy: 0.7627 - val_loss: 0.6807 - val_accuracy: 0.8174\n", "Epoch 5/25\n", "55000/55000 [==============================] - 6s 101us/sample - loss: 0.7213 - accuracy: 0.8068 - val_loss: 1.0441 - val_accuracy: 0.8030\n", "Epoch 6/25\n", "55000/55000 [==============================] - 6s 101us/sample - loss: 0.4882 - accuracy: 0.8548 - val_loss: 0.5411 - val_accuracy: 0.8494\n", "Epoch 7/25\n", "55000/55000 [==============================] - 6s 101us/sample - loss: 0.4721 - accuracy: 0.8568 - val_loss: 0.5808 - val_accuracy: 0.8448\n", "Epoch 8/25\n", "55000/55000 [==============================] - 6s 101us/sample - loss: 0.4412 - accuracy: 0.8659 - val_loss: 0.5466 - val_accuracy: 0.8526\n", "Epoch 9/25\n", "55000/55000 [==============================] - 6s 100us/sample - loss: 0.4234 - accuracy: 0.8718 - val_loss: 0.5611 - val_accuracy: 0.8528\n", "Epoch 10/25\n", "55000/55000 [==============================] - 5s 99us/sample - loss: 0.4300 - accuracy: 0.8721 - val_loss: 0.5049 - val_accuracy: 0.8650\n", "Epoch 11/25\n", "55000/55000 [==============================] - 5s 100us/sample - loss: 0.4162 - accuracy: 0.8768 - val_loss: 0.5957 - val_accuracy: 0.8534\n", "Epoch 12/25\n", "55000/55000 [==============================] - 6s 101us/sample - loss: 0.4122 - accuracy: 0.8780 - val_loss: 0.5707 - val_accuracy: 0.8640\n", "Epoch 13/25\n", "55000/55000 [==============================] - 6s 101us/sample - loss: 0.3951 - accuracy: 0.8833 - val_loss: 0.5523 - val_accuracy: 0.8690\n", "Epoch 14/25\n", "55000/55000 [==============================] - 5s 100us/sample - loss: 0.3961 - accuracy: 0.8834 - val_loss: 0.7371 - val_accuracy: 0.8452\n", "Epoch 15/25\n", "55000/55000 [==============================] - 5s 100us/sample - loss: 0.4201 - accuracy: 0.8839 - val_loss: 0.6546 - val_accuracy: 0.8558\n", "Epoch 16/25\n", "55000/55000 [==============================] - 6s 100us/sample - loss: 0.2645 - accuracy: 0.9162 - val_loss: 0.4655 - val_accuracy: 0.8844\n", "Epoch 17/25\n", "55000/55000 [==============================] - 6s 100us/sample - loss: 0.2440 - accuracy: 0.9222 - val_loss: 0.4758 - val_accuracy: 0.8830\n", "Epoch 18/25\n", "55000/55000 [==============================] - 6s 100us/sample - loss: 0.2320 - accuracy: 0.9256 - val_loss: 0.4917 - val_accuracy: 0.8880\n", "Epoch 19/25\n", "55000/55000 [==============================] - 6s 100us/sample - loss: 0.2248 - accuracy: 0.9279 - val_loss: 0.4644 - val_accuracy: 0.8878\n", "Epoch 20/25\n", "55000/55000 [==============================] - 6s 100us/sample - loss: 0.2172 - accuracy: 0.9302 - val_loss: 0.5036 - val_accuracy: 0.8848\n", "Epoch 21/25\n", "55000/55000 [==============================] - 6s 100us/sample - loss: 0.2139 - accuracy: 0.9327 - val_loss: 0.4921 - val_accuracy: 0.8914\n", "Epoch 22/25\n", "55000/55000 [==============================] - 6s 101us/sample - loss: 0.2030 - accuracy: 0.9360 - val_loss: 0.5197 - val_accuracy: 0.8860\n", "Epoch 23/25\n", "55000/55000 [==============================] - 5s 100us/sample - loss: 0.2014 - accuracy: 0.9360 - val_loss: 0.5231 - val_accuracy: 0.8892\n", "Epoch 24/25\n", "55000/55000 [==============================] - 5s 100us/sample - loss: 0.1912 - accuracy: 0.9391 - val_loss: 0.5223 - val_accuracy: 0.8876\n", "Epoch 25/25\n", "55000/55000 [==============================] - 5s 99us/sample - loss: 0.1872 - accuracy: 0.9418 - val_loss: 0.5068 - val_accuracy: 0.8886\n" ] } ], "source": [ "lr_scheduler = keras.callbacks.LearningRateScheduler(piecewise_constant_fn)\n", "\n", "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])\n", "model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n", "n_epochs = 25\n", "history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n", " validation_data=(X_valid_scaled, y_valid),\n", " callbacks=[lr_scheduler])" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(history.epoch, [piecewise_constant_fn(epoch) for epoch in history.epoch], \"o-\")\n", "plt.axis([0, n_epochs - 1, 0, 0.011])\n", "plt.xlabel(\"Epoch\")\n", "plt.ylabel(\"Learning Rate\")\n", "plt.title(\"Piecewise Constant Scheduling\", fontsize=14)\n", "plt.grid(True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Performance Scheduling" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/25\n", "55000/55000 [==============================] - 4s 79us/sample - loss: 0.5954 - accuracy: 0.8055 - val_loss: 0.5432 - val_accuracy: 0.8154\n", "Epoch 2/25\n", "55000/55000 [==============================] - 4s 74us/sample - loss: 0.5194 - accuracy: 0.8345 - val_loss: 0.5184 - val_accuracy: 0.8468\n", "Epoch 3/25\n", "55000/55000 [==============================] - 4s 73us/sample - loss: 0.5080 - accuracy: 0.8453 - val_loss: 0.5780 - val_accuracy: 0.8384\n", "Epoch 4/25\n", "55000/55000 [==============================] - 4s 73us/sample - loss: 0.5360 - accuracy: 0.8452 - val_loss: 0.7195 - val_accuracy: 0.8350\n", "Epoch 5/25\n", "55000/55000 [==============================] - 4s 74us/sample - loss: 0.5239 - accuracy: 0.8504 - val_loss: 0.5219 - val_accuracy: 0.8562\n", "Epoch 6/25\n", "55000/55000 [==============================] - 4s 74us/sample - loss: 0.5163 - accuracy: 0.8528 - val_loss: 0.5669 - val_accuracy: 0.8382\n", "Epoch 7/25\n", "55000/55000 [==============================] - 4s 74us/sample - loss: 0.5088 - accuracy: 0.8561 - val_loss: 0.6591 - val_accuracy: 0.8268\n", "Epoch 8/25\n", "55000/55000 [==============================] - 4s 77us/sample - loss: 0.3022 - accuracy: 0.8938 - val_loss: 0.3955 - val_accuracy: 0.8834\n", "Epoch 9/25\n", "55000/55000 [==============================] - 4s 76us/sample - loss: 0.2501 - accuracy: 0.9087 - val_loss: 0.4060 - val_accuracy: 0.8792\n", "Epoch 10/25\n", "55000/55000 [==============================] - 4s 75us/sample - loss: 0.2304 - accuracy: 0.9158 - val_loss: 0.3998 - val_accuracy: 0.8846\n", "Epoch 11/25\n", "55000/55000 [==============================] - 4s 75us/sample - loss: 0.2155 - accuracy: 0.9206 - val_loss: 0.3880 - val_accuracy: 0.8898\n", "Epoch 12/25\n", "55000/55000 [==============================] - 4s 75us/sample - loss: 0.2034 - accuracy: 0.9253 - val_loss: 0.4049 - val_accuracy: 0.8838\n", "Epoch 13/25\n", "55000/55000 [==============================] - 4s 77us/sample - loss: 0.1878 - accuracy: 0.9285 - val_loss: 0.4440 - val_accuracy: 0.8838\n", "Epoch 14/25\n", "55000/55000 [==============================] - 4s 80us/sample - loss: 0.1839 - accuracy: 0.9325 - val_loss: 0.4478 - val_accuracy: 0.8838\n", "Epoch 15/25\n", "55000/55000 [==============================] - 4s 76us/sample - loss: 0.1747 - accuracy: 0.9348 - val_loss: 0.5072 - val_accuracy: 0.8806\n", "Epoch 16/25\n", "55000/55000 [==============================] - 4s 75us/sample - loss: 0.1689 - accuracy: 0.9367 - val_loss: 0.4897 - val_accuracy: 0.8790\n", "Epoch 17/25\n", "55000/55000 [==============================] - 4s 78us/sample - loss: 0.1090 - accuracy: 0.9576 - val_loss: 0.4571 - val_accuracy: 0.8900\n", "Epoch 18/25\n", "55000/55000 [==============================] - 4s 74us/sample - loss: 0.0926 - accuracy: 0.9639 - val_loss: 0.4563 - val_accuracy: 0.8934\n", "Epoch 19/25\n", "55000/55000 [==============================] - 4s 75us/sample - loss: 0.0861 - accuracy: 0.9671 - val_loss: 0.5103 - val_accuracy: 0.8898\n", "Epoch 20/25\n", "55000/55000 [==============================] - 4s 75us/sample - loss: 0.0794 - accuracy: 0.9692 - val_loss: 0.5065 - val_accuracy: 0.8936\n", "Epoch 21/25\n", "55000/55000 [==============================] - 4s 75us/sample - loss: 0.0737 - accuracy: 0.9721 - val_loss: 0.5516 - val_accuracy: 0.8928\n", "Epoch 22/25\n", "55000/55000 [==============================] - 4s 76us/sample - loss: 0.0547 - accuracy: 0.9803 - val_loss: 0.5315 - val_accuracy: 0.8944\n", "Epoch 23/25\n", "55000/55000 [==============================] - 4s 78us/sample - loss: 0.0487 - accuracy: 0.9827 - val_loss: 0.5429 - val_accuracy: 0.8928\n", "Epoch 24/25\n", "55000/55000 [==============================] - 4s 80us/sample - loss: 0.0455 - accuracy: 0.9844 - val_loss: 0.5554 - val_accuracy: 0.8918\n", "Epoch 25/25\n", "55000/55000 [==============================] - 4s 79us/sample - loss: 0.0427 - accuracy: 0.9850 - val_loss: 0.5730 - val_accuracy: 0.8920\n" ] } ], "source": [ "lr_scheduler = keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=5)\n", "\n", "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])\n", "optimizer = keras.optimizers.SGD(lr=0.02, momentum=0.9)\n", "model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer, metrics=[\"accuracy\"])\n", "n_epochs = 25\n", "history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n", " validation_data=(X_valid_scaled, y_valid),\n", " callbacks=[lr_scheduler])" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(history.epoch, history.history[\"lr\"], \"bo-\")\n", "plt.xlabel(\"Epoch\")\n", "plt.ylabel(\"Learning Rate\", color='b')\n", "plt.tick_params('y', colors='b')\n", "plt.gca().set_xlim(0, n_epochs - 1)\n", "plt.grid(True)\n", "\n", "ax2 = plt.gca().twinx()\n", "ax2.plot(history.epoch, history.history[\"val_loss\"], \"r^-\")\n", "ax2.set_ylabel('Validation Loss', color='r')\n", "ax2.tick_params('y', colors='r')\n", "\n", "plt.title(\"Reduce LR on Plateau\", fontsize=14)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### tf.keras schedulers" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/25\n", "55000/55000 [==============================] - 4s 77us/sample - loss: 0.4887 - accuracy: 0.8282 - val_loss: 0.4245 - val_accuracy: 0.8526\n", "Epoch 2/25\n", "55000/55000 [==============================] - 4s 71us/sample - loss: 0.3830 - accuracy: 0.8641 - val_loss: 0.3798 - val_accuracy: 0.8688\n", "Epoch 3/25\n", "55000/55000 [==============================] - 4s 71us/sample - loss: 0.3491 - accuracy: 0.8758 - val_loss: 0.3650 - val_accuracy: 0.8730\n", "Epoch 4/25\n", "55000/55000 [==============================] - 4s 78us/sample - loss: 0.3267 - accuracy: 0.8839 - val_loss: 0.3564 - val_accuracy: 0.8746\n", "Epoch 5/25\n", "55000/55000 [==============================] - 4s 72us/sample - loss: 0.3102 - accuracy: 0.8893 - val_loss: 0.3493 - val_accuracy: 0.8770\n", "Epoch 6/25\n", "55000/55000 [==============================] - 4s 73us/sample - loss: 0.2969 - accuracy: 0.8939 - val_loss: 0.3400 - val_accuracy: 0.8818\n", "Epoch 7/25\n", "55000/55000 [==============================] - 4s 77us/sample - loss: 0.2855 - accuracy: 0.8983 - val_loss: 0.3385 - val_accuracy: 0.8830\n", "Epoch 8/25\n", "55000/55000 [==============================] - 4s 68us/sample - loss: 0.2764 - accuracy: 0.9025 - val_loss: 0.3372 - val_accuracy: 0.8824\n", "Epoch 9/25\n", "55000/55000 [==============================] - 4s 67us/sample - loss: 0.2684 - accuracy: 0.9039 - val_loss: 0.3337 - val_accuracy: 0.8848\n", "Epoch 10/25\n", "55000/55000 [==============================] - 4s 73us/sample - loss: 0.2613 - accuracy: 0.9072 - val_loss: 0.3277 - val_accuracy: 0.8862\n", "Epoch 11/25\n", "55000/55000 [==============================] - 4s 71us/sample - loss: 0.2555 - accuracy: 0.9086 - val_loss: 0.3273 - val_accuracy: 0.8860\n", "Epoch 12/25\n", "55000/55000 [==============================] - 4s 73us/sample - loss: 0.2500 - accuracy: 0.9111 - val_loss: 0.3244 - val_accuracy: 0.8840\n", "Epoch 13/25\n", "55000/55000 [==============================] - 4s 73us/sample - loss: 0.2454 - accuracy: 0.9124 - val_loss: 0.3194 - val_accuracy: 0.8904\n", "Epoch 14/25\n", "55000/55000 [==============================] - 4s 71us/sample - loss: 0.2414 - accuracy: 0.9141 - val_loss: 0.3226 - val_accuracy: 0.8884\n", "Epoch 15/25\n", "55000/55000 [==============================] - 4s 73us/sample - loss: 0.2378 - accuracy: 0.9160 - val_loss: 0.3233 - val_accuracy: 0.8860\n", "Epoch 16/25\n", "55000/55000 [==============================] - 4s 69us/sample - loss: 0.2347 - accuracy: 0.9174 - val_loss: 0.3207 - val_accuracy: 0.8904\n", "Epoch 17/25\n", "55000/55000 [==============================] - 4s 71us/sample - loss: 0.2318 - accuracy: 0.9179 - val_loss: 0.3195 - val_accuracy: 0.8892\n", "Epoch 18/25\n", "55000/55000 [==============================] - 4s 69us/sample - loss: 0.2293 - accuracy: 0.9193 - val_loss: 0.3184 - val_accuracy: 0.8916\n", "Epoch 19/25\n", "55000/55000 [==============================] - 4s 67us/sample - loss: 0.2272 - accuracy: 0.9201 - val_loss: 0.3196 - val_accuracy: 0.8886\n", "Epoch 20/25\n", "55000/55000 [==============================] - 4s 68us/sample - loss: 0.2253 - accuracy: 0.9206 - val_loss: 0.3190 - val_accuracy: 0.8918\n", "Epoch 21/25\n", "55000/55000 [==============================] - 4s 68us/sample - loss: 0.2235 - accuracy: 0.9214 - val_loss: 0.3176 - val_accuracy: 0.8912\n", "Epoch 22/25\n", "55000/55000 [==============================] - 4s 69us/sample - loss: 0.2220 - accuracy: 0.9220 - val_loss: 0.3181 - val_accuracy: 0.8900\n", "Epoch 23/25\n", "55000/55000 [==============================] - 4s 71us/sample - loss: 0.2206 - accuracy: 0.9226 - val_loss: 0.3187 - val_accuracy: 0.8894\n", "Epoch 24/25\n", "55000/55000 [==============================] - 4s 68us/sample - loss: 0.2193 - accuracy: 0.9231 - val_loss: 0.3168 - val_accuracy: 0.8908\n", "Epoch 25/25\n", "55000/55000 [==============================] - 4s 68us/sample - loss: 0.2181 - accuracy: 0.9234 - val_loss: 0.3171 - val_accuracy: 0.8898\n" ] } ], "source": [ "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])\n", "s = 20 * len(X_train) // 32 # number of steps in 20 epochs (batch size = 32)\n", "learning_rate = keras.optimizers.schedules.ExponentialDecay(0.01, s, 0.1)\n", "optimizer = keras.optimizers.SGD(learning_rate)\n", "model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer, metrics=[\"accuracy\"])\n", "n_epochs = 25\n", "history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n", " validation_data=(X_valid_scaled, y_valid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For piecewise constant scheduling, try this:" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [], "source": [ "learning_rate = keras.optimizers.schedules.PiecewiseConstantDecay(\n", " boundaries=[5. * n_steps_per_epoch, 15. * n_steps_per_epoch],\n", " values=[0.01, 0.005, 0.001])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1Cycle scheduling" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [], "source": [ "K = keras.backend\n", "\n", "class ExponentialLearningRate(keras.callbacks.Callback):\n", " def __init__(self, factor):\n", " self.factor = factor\n", " self.rates = []\n", " self.losses = []\n", " def on_batch_end(self, batch, logs):\n", " self.rates.append(K.get_value(self.model.optimizer.lr))\n", " self.losses.append(logs[\"loss\"])\n", " K.set_value(self.model.optimizer.lr, self.model.optimizer.lr * self.factor)\n", "\n", "def find_learning_rate(model, X, y, epochs=1, batch_size=32, min_rate=10**-5, max_rate=10):\n", " init_weights = model.get_weights()\n", " iterations = len(X) // batch_size * epochs\n", " factor = np.exp(np.log(max_rate / min_rate) / iterations)\n", " init_lr = K.get_value(model.optimizer.lr)\n", " K.set_value(model.optimizer.lr, min_rate)\n", " exp_lr = ExponentialLearningRate(factor)\n", " history = model.fit(X, y, epochs=epochs, batch_size=batch_size,\n", " callbacks=[exp_lr])\n", " K.set_value(model.optimizer.lr, init_lr)\n", " model.set_weights(init_weights)\n", " return exp_lr.rates, exp_lr.losses\n", "\n", "def plot_lr_vs_loss(rates, losses):\n", " plt.plot(rates, losses)\n", " plt.gca().set_xscale('log')\n", " plt.hlines(min(losses), min(rates), max(rates))\n", " plt.axis([min(rates), max(rates), min(losses), (losses[0] + min(losses)) / 2])\n", " plt.xlabel(\"Learning rate\")\n", " plt.ylabel(\"Loss\")" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])\n", "model.compile(loss=\"sparse_categorical_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=1e-3),\n", " metrics=[\"accuracy\"])" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples\n", "55000/55000 [==============================] - 2s 28us/sample - loss: nan - accuracy: 0.3888\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "batch_size = 128\n", "rates, losses = find_learning_rate(model, X_train_scaled, y_train, epochs=1, batch_size=batch_size)\n", "plot_lr_vs_loss(rates, losses)" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [], "source": [ "class OneCycleScheduler(keras.callbacks.Callback):\n", " def __init__(self, iterations, max_rate, start_rate=None,\n", " last_iterations=None, last_rate=None):\n", " self.iterations = iterations\n", " self.max_rate = max_rate\n", " self.start_rate = start_rate or max_rate / 10\n", " self.last_iterations = last_iterations or iterations // 10 + 1\n", " self.half_iteration = (iterations - self.last_iterations) // 2\n", " self.last_rate = last_rate or self.start_rate / 1000\n", " self.iteration = 0\n", " def _interpolate(self, iter1, iter2, rate1, rate2):\n", " return ((rate2 - rate1) * (self.iteration - iter1)\n", " / (iter2 - iter1) + rate1)\n", " def on_batch_begin(self, batch, logs):\n", " if self.iteration < self.half_iteration:\n", " rate = self._interpolate(0, self.half_iteration, self.start_rate, self.max_rate)\n", " elif self.iteration < 2 * self.half_iteration:\n", " rate = self._interpolate(self.half_iteration, 2 * self.half_iteration,\n", " self.max_rate, self.start_rate)\n", " else:\n", " rate = self._interpolate(2 * self.half_iteration, self.iterations,\n", " self.start_rate, self.last_rate)\n", " rate = max(rate, self.last_rate)\n", " self.iteration += 1\n", " K.set_value(self.model.optimizer.lr, rate)" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/25\n", "55000/55000 [==============================] - 1s 23us/sample - loss: 0.6569 - accuracy: 0.7750 - val_loss: 0.4875 - val_accuracy: 0.8300\n", "Epoch 2/25\n", "55000/55000 [==============================] - 1s 22us/sample - loss: 0.4584 - accuracy: 0.8391 - val_loss: 0.4390 - val_accuracy: 0.8476\n", "Epoch 3/25\n", "55000/55000 [==============================] - 1s 21us/sample - loss: 0.4124 - accuracy: 0.8541 - val_loss: 0.4102 - val_accuracy: 0.8570\n", "Epoch 4/25\n", "55000/55000 [==============================] - 1s 22us/sample - loss: 0.3842 - accuracy: 0.8643 - val_loss: 0.3893 - val_accuracy: 0.8652\n", "Epoch 5/25\n", "55000/55000 [==============================] - 1s 21us/sample - loss: 0.3641 - accuracy: 0.8707 - val_loss: 0.3736 - val_accuracy: 0.8678\n", "Epoch 6/25\n", "55000/55000 [==============================] - 1s 22us/sample - loss: 0.3456 - accuracy: 0.8781 - val_loss: 0.3652 - val_accuracy: 0.8726\n", "Epoch 7/25\n", "55000/55000 [==============================] - 1s 23us/sample - loss: 0.3318 - accuracy: 0.8818 - val_loss: 0.3596 - val_accuracy: 0.8768\n", "Epoch 8/25\n", "55000/55000 [==============================] - 1s 24us/sample - loss: 0.3180 - accuracy: 0.8862 - val_loss: 0.3845 - val_accuracy: 0.8602\n", "Epoch 9/25\n", "55000/55000 [==============================] - 1s 23us/sample - loss: 0.3062 - accuracy: 0.8893 - val_loss: 0.3824 - val_accuracy: 0.8660\n", "Epoch 10/25\n", "55000/55000 [==============================] - 1s 23us/sample - loss: 0.2938 - accuracy: 0.8934 - val_loss: 0.3516 - val_accuracy: 0.8742\n", "Epoch 11/25\n", "55000/55000 [==============================] - 1s 23us/sample - loss: 0.2838 - accuracy: 0.8975 - val_loss: 0.3609 - val_accuracy: 0.8740\n", "Epoch 12/25\n", "55000/55000 [==============================] - 1s 23us/sample - loss: 0.2716 - accuracy: 0.9025 - val_loss: 0.3843 - val_accuracy: 0.8666\n", "Epoch 13/25\n", "55000/55000 [==============================] - 1s 22us/sample - loss: 0.2541 - accuracy: 0.9091 - val_loss: 0.3282 - val_accuracy: 0.8844\n", "Epoch 14/25\n", "55000/55000 [==============================] - 1s 22us/sample - loss: 0.2390 - accuracy: 0.9139 - val_loss: 0.3336 - val_accuracy: 0.8838\n", "Epoch 15/25\n", "55000/55000 [==============================] - 1s 23us/sample - loss: 0.2273 - accuracy: 0.9177 - val_loss: 0.3283 - val_accuracy: 0.8884\n", "Epoch 16/25\n", "55000/55000 [==============================] - 1s 22us/sample - loss: 0.2156 - accuracy: 0.9234 - val_loss: 0.3288 - val_accuracy: 0.8862\n", "Epoch 17/25\n", "55000/55000 [==============================] - 1s 26us/sample - loss: 0.2062 - accuracy: 0.9265 - val_loss: 0.3215 - val_accuracy: 0.8896\n", "Epoch 18/25\n", "55000/55000 [==============================] - 1s 24us/sample - loss: 0.1973 - accuracy: 0.9299 - val_loss: 0.3284 - val_accuracy: 0.8912\n", "Epoch 19/25\n", "55000/55000 [==============================] - 1s 22us/sample - loss: 0.1892 - accuracy: 0.9344 - val_loss: 0.3229 - val_accuracy: 0.8904\n", "Epoch 20/25\n", "55000/55000 [==============================] - 1s 22us/sample - loss: 0.1822 - accuracy: 0.9366 - val_loss: 0.3196 - val_accuracy: 0.8902\n", "Epoch 21/25\n", "55000/55000 [==============================] - 1s 24us/sample - loss: 0.1758 - accuracy: 0.9388 - val_loss: 0.3184 - val_accuracy: 0.8940\n", "Epoch 22/25\n", "55000/55000 [==============================] - 1s 27us/sample - loss: 0.1699 - accuracy: 0.9422 - val_loss: 0.3221 - val_accuracy: 0.8912\n", "Epoch 23/25\n", "55000/55000 [==============================] - 1s 26us/sample - loss: 0.1657 - accuracy: 0.9444 - val_loss: 0.3173 - val_accuracy: 0.8944\n", "Epoch 24/25\n", "55000/55000 [==============================] - 1s 23us/sample - loss: 0.1630 - accuracy: 0.9457 - val_loss: 0.3162 - val_accuracy: 0.8946\n", "Epoch 25/25\n", "55000/55000 [==============================] - 1s 26us/sample - loss: 0.1610 - accuracy: 0.9464 - val_loss: 0.3169 - val_accuracy: 0.8942\n" ] } ], "source": [ "n_epochs = 25\n", "onecycle = OneCycleScheduler(len(X_train) // batch_size * n_epochs, max_rate=0.05)\n", "history = model.fit(X_train_scaled, y_train, epochs=n_epochs, batch_size=batch_size,\n", " validation_data=(X_valid_scaled, y_valid),\n", " callbacks=[onecycle])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Avoiding Overfitting Through Regularization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## $\\ell_1$ and $\\ell_2$ regularization" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [], "source": [ "layer = keras.layers.Dense(100, activation=\"elu\",\n", " kernel_initializer=\"he_normal\",\n", " kernel_regularizer=keras.regularizers.l2(0.01))\n", "# or l1(0.1) for ℓ1 regularization with a factor or 0.1\n", "# or l1_l2(0.1, 0.01) for both ℓ1 and ℓ2 regularization, with factors 0.1 and 0.01 respectively" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/2\n", "55000/55000 [==============================] - 7s 128us/sample - loss: 1.6073 - accuracy: 0.8112 - val_loss: 0.7314 - val_accuracy: 0.8242\n", "Epoch 2/2\n", "55000/55000 [==============================] - 6s 117us/sample - loss: 0.7193 - accuracy: 0.8256 - val_loss: 0.7029 - val_accuracy: 0.8304\n" ] } ], "source": [ "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(300, activation=\"elu\",\n", " kernel_initializer=\"he_normal\",\n", " kernel_regularizer=keras.regularizers.l2(0.01)),\n", " keras.layers.Dense(100, activation=\"elu\",\n", " kernel_initializer=\"he_normal\",\n", " kernel_regularizer=keras.regularizers.l2(0.01)),\n", " keras.layers.Dense(10, activation=\"softmax\",\n", " kernel_regularizer=keras.regularizers.l2(0.01))\n", "])\n", "model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n", "n_epochs = 2\n", "history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n", " validation_data=(X_valid_scaled, y_valid))" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/2\n", "55000/55000 [==============================] - 7s 129us/sample - loss: 1.6597 - accuracy: 0.8128 - val_loss: 0.7630 - val_accuracy: 0.8080\n", "Epoch 2/2\n", "55000/55000 [==============================] - 7s 124us/sample - loss: 0.7176 - accuracy: 0.8271 - val_loss: 0.6848 - val_accuracy: 0.8360\n" ] } ], "source": [ "from functools import partial\n", "\n", "RegularizedDense = partial(keras.layers.Dense,\n", " activation=\"elu\",\n", " kernel_initializer=\"he_normal\",\n", " kernel_regularizer=keras.regularizers.l2(0.01))\n", "\n", "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " RegularizedDense(300),\n", " RegularizedDense(100),\n", " RegularizedDense(10, activation=\"softmax\")\n", "])\n", "model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n", "n_epochs = 2\n", "history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n", " validation_data=(X_valid_scaled, y_valid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dropout" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/2\n", "55000/55000 [==============================] - 8s 145us/sample - loss: 0.5741 - accuracy: 0.8030 - val_loss: 0.3841 - val_accuracy: 0.8572\n", "Epoch 2/2\n", "55000/55000 [==============================] - 7s 134us/sample - loss: 0.4218 - accuracy: 0.8469 - val_loss: 0.3534 - val_accuracy: 0.8728\n" ] } ], "source": [ "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dropout(rate=0.2),\n", " keras.layers.Dense(300, activation=\"elu\", kernel_initializer=\"he_normal\"),\n", " keras.layers.Dropout(rate=0.2),\n", " keras.layers.Dense(100, activation=\"elu\", kernel_initializer=\"he_normal\"),\n", " keras.layers.Dropout(rate=0.2),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])\n", "model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n", "n_epochs = 2\n", "history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n", " validation_data=(X_valid_scaled, y_valid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Alpha Dropout" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/20\n", "55000/55000 [==============================] - 6s 111us/sample - loss: 0.6639 - accuracy: 0.7582 - val_loss: 0.5840 - val_accuracy: 0.8410\n", "Epoch 2/20\n", "55000/55000 [==============================] - 5s 97us/sample - loss: 0.5517 - accuracy: 0.7968 - val_loss: 0.5747 - val_accuracy: 0.8430\n", "Epoch 3/20\n", "55000/55000 [==============================] - 5s 94us/sample - loss: 0.5260 - accuracy: 0.8062 - val_loss: 0.5233 - val_accuracy: 0.8486\n", "Epoch 4/20\n", "55000/55000 [==============================] - 5s 94us/sample - loss: 0.5055 - accuracy: 0.8136 - val_loss: 0.4687 - val_accuracy: 0.8606\n", "Epoch 5/20\n", "55000/55000 [==============================] - 5s 96us/sample - loss: 0.4897 - accuracy: 0.8187 - val_loss: 0.5188 - val_accuracy: 0.8588\n", "Epoch 6/20\n", "55000/55000 [==============================] - 5s 93us/sample - loss: 0.4812 - accuracy: 0.8217 - val_loss: 0.4929 - val_accuracy: 0.8508\n", "Epoch 7/20\n", "55000/55000 [==============================] - 5s 90us/sample - loss: 0.4687 - accuracy: 0.8251 - val_loss: 0.4840 - val_accuracy: 0.8572\n", "Epoch 8/20\n", "55000/55000 [==============================] - 5s 90us/sample - loss: 0.4709 - accuracy: 0.8249 - val_loss: 0.4227 - val_accuracy: 0.8660\n", "Epoch 9/20\n", "55000/55000 [==============================] - 5s 92us/sample - loss: 0.4515 - accuracy: 0.8313 - val_loss: 0.4796 - val_accuracy: 0.8670\n", "Epoch 10/20\n", "55000/55000 [==============================] - 5s 93us/sample - loss: 0.4508 - accuracy: 0.8329 - val_loss: 0.4901 - val_accuracy: 0.8588\n", "Epoch 11/20\n", "55000/55000 [==============================] - 5s 93us/sample - loss: 0.4484 - accuracy: 0.8338 - val_loss: 0.4678 - val_accuracy: 0.8640\n", "Epoch 12/20\n", "55000/55000 [==============================] - 5s 95us/sample - loss: 0.4417 - accuracy: 0.8366 - val_loss: 0.4684 - val_accuracy: 0.8610\n", "Epoch 13/20\n", "55000/55000 [==============================] - 5s 93us/sample - loss: 0.4421 - accuracy: 0.8370 - val_loss: 0.4347 - val_accuracy: 0.8640\n", "Epoch 14/20\n", "55000/55000 [==============================] - 5s 98us/sample - loss: 0.4377 - accuracy: 0.8369 - val_loss: 0.4204 - val_accuracy: 0.8734\n", "Epoch 15/20\n", "55000/55000 [==============================] - 5s 95us/sample - loss: 0.4329 - accuracy: 0.8384 - val_loss: 0.4820 - val_accuracy: 0.8718\n", "Epoch 16/20\n", "55000/55000 [==============================] - 6s 100us/sample - loss: 0.4328 - accuracy: 0.8388 - val_loss: 0.4447 - val_accuracy: 0.8754\n", "Epoch 17/20\n", "55000/55000 [==============================] - 5s 96us/sample - loss: 0.4243 - accuracy: 0.8413 - val_loss: 0.4502 - val_accuracy: 0.8776\n", "Epoch 18/20\n", "55000/55000 [==============================] - 5s 95us/sample - loss: 0.4242 - accuracy: 0.8432 - val_loss: 0.4070 - val_accuracy: 0.8720\n", "Epoch 19/20\n", "55000/55000 [==============================] - 5s 94us/sample - loss: 0.4195 - accuracy: 0.8437 - val_loss: 0.4738 - val_accuracy: 0.8670\n", "Epoch 20/20\n", "55000/55000 [==============================] - 5s 96us/sample - loss: 0.4191 - accuracy: 0.8439 - val_loss: 0.4163 - val_accuracy: 0.8790\n" ] } ], "source": [ "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.AlphaDropout(rate=0.2),\n", " keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.AlphaDropout(rate=0.2),\n", " keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n", " keras.layers.AlphaDropout(rate=0.2),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])\n", "optimizer = keras.optimizers.SGD(lr=0.01, momentum=0.9, nesterov=True)\n", "model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer, metrics=[\"accuracy\"])\n", "n_epochs = 20\n", "history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n", " validation_data=(X_valid_scaled, y_valid))" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10000/10000 [==============================] - 0s 39us/sample - loss: 0.4535 - accuracy: 0.8680\n" ] }, { "data": { "text/plain": [ "[0.45350628316402436, 0.868]" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.evaluate(X_test_scaled, y_test)" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "55000/55000 [==============================] - 2s 41us/sample - loss: 0.3357 - accuracy: 0.8887\n" ] }, { "data": { "text/plain": [ "[0.335701530437036, 0.88872725]" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.evaluate(X_train_scaled, y_train)" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [], "source": [ "history = model.fit(X_train_scaled, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## MC Dropout" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [], "source": [ "y_probas = np.stack([model(X_test_scaled, training=True)\n", " for sample in range(100)])\n", "y_proba = y_probas.mean(axis=0)\n", "y_std = y_probas.std(axis=0)" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", " dtype=float32)" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.round(model.predict(X_test_scaled[:1]), 2)" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", "\n", " [[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]]],\n", " dtype=float32)" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.round(y_probas[:, :1], 2)" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01, 0. , 0.99]],\n", " dtype=float32)" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.round(y_proba[:1], 2)" ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_std = y_probas.std(axis=0)\n", "np.round(y_std[:1], 2)" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [], "source": [ "y_pred = np.argmax(y_proba, axis=1)" ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.868" ] }, "execution_count": 117, "metadata": {}, "output_type": "execute_result" } ], "source": [ "accuracy = np.sum(y_pred == y_test) / len(y_test)\n", "accuracy" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [], "source": [ "class MCDropout(keras.layers.Dropout):\n", " def call(self, inputs):\n", " return super().call(inputs, training=True)\n", "\n", "class MCAlphaDropout(keras.layers.AlphaDropout):\n", " def call(self, inputs):\n", " return super().call(inputs, training=True)" ] }, { "cell_type": "code", "execution_count": 119, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)" ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [], "source": [ "mc_model = keras.models.Sequential([\n", " MCAlphaDropout(layer.rate) if isinstance(layer, keras.layers.AlphaDropout) else layer\n", " for layer in model.layers\n", "])" ] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model: \"sequential_36\"\n", "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "flatten_33 (Flatten) (None, 784) 0 \n", "_________________________________________________________________\n", "mc_alpha_dropout_3 (MCAlphaD (None, 784) 0 \n", "_________________________________________________________________\n", "dense_311 (Dense) (None, 300) 235500 \n", "_________________________________________________________________\n", "mc_alpha_dropout_4 (MCAlphaD (None, 300) 0 \n", "_________________________________________________________________\n", "dense_312 (Dense) (None, 100) 30100 \n", "_________________________________________________________________\n", "mc_alpha_dropout_5 (MCAlphaD (None, 100) 0 \n", "_________________________________________________________________\n", "dense_313 (Dense) (None, 10) 1010 \n", "=================================================================\n", "Total params: 266,610\n", "Trainable params: 266,610\n", "Non-trainable params: 0\n", "_________________________________________________________________\n" ] } ], "source": [ "mc_model.summary()" ] }, { "cell_type": "code", "execution_count": 122, "metadata": {}, "outputs": [], "source": [ "optimizer = keras.optimizers.SGD(lr=0.01, momentum=0.9, nesterov=True)\n", "mc_model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer, metrics=[\"accuracy\"])" ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [], "source": [ "mc_model.set_weights(model.get_weights())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can use the model with MC Dropout:" ] }, { "cell_type": "code", "execution_count": 124, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0. , 0. , 0. , 0. , 0. , 0.17, 0. , 0.19, 0. , 0.64]],\n", " dtype=float32)" ] }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.round(np.mean([mc_model.predict(X_test_scaled[:1]) for sample in range(100)], axis=0), 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Max norm" ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [], "source": [ "layer = keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\",\n", " kernel_constraint=keras.constraints.max_norm(1.))" ] }, { "cell_type": "code", "execution_count": 126, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 55000 samples, validate on 5000 samples\n", "Epoch 1/2\n", "55000/55000 [==============================] - 8s 147us/sample - loss: 0.4745 - accuracy: 0.8329 - val_loss: 0.3988 - val_accuracy: 0.8584\n", "Epoch 2/2\n", "55000/55000 [==============================] - 7s 135us/sample - loss: 0.3554 - accuracy: 0.8688 - val_loss: 0.3681 - val_accuracy: 0.8726\n" ] } ], "source": [ "MaxNormDense = partial(keras.layers.Dense,\n", " activation=\"selu\", kernel_initializer=\"lecun_normal\",\n", " kernel_constraint=keras.constraints.max_norm(1.))\n", "\n", "model = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " MaxNormDense(300),\n", " MaxNormDense(100),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])\n", "model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n", "n_epochs = 2\n", "history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n", " validation_data=(X_valid_scaled, y_valid))" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Exercises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. to 7." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See appendix A." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Deep Learning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 8.1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Exercise: Build a DNN with five hidden layers of 100 neurons each, He initialization, and the ELU activation function._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 8.2." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Exercise: Using Adam optimization and early stopping, try training it on MNIST but only on digits 0 to 4, as we will use transfer learning for digits 5 to 9 in the next exercise. You will need a softmax output layer with five neurons, and as always make sure to save checkpoints at regular intervals and save the final model so you can reuse it later._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 8.3." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Exercise: Tune the hyperparameters using cross-validation and see what precision you can achieve._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 8.4." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Exercise: Now try adding Batch Normalization and compare the learning curves: is it converging faster than before? Does it produce a better model?_" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 8.5." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Exercise: is the model overfitting the training set? Try adding dropout to every layer and try again. Does it help?_" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## 9. Transfer learning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 9.1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Exercise: create a new DNN that reuses all the pretrained hidden layers of the previous model, freezes them, and replaces the softmax output layer with a new one._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 9.2." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Exercise: train this new DNN on digits 5 to 9, using only 100 images per digit, and time how long it takes. Despite this small number of examples, can you achieve high precision?_" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 9.3." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Exercise: try caching the frozen layers, and train the model again: how much faster is it now?_" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 9.4." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Exercise: try again reusing just four hidden layers instead of five. Can you achieve a higher precision?_" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 9.5." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Exercise: now unfreeze the top two hidden layers and continue training: can you get the model to perform even better?_" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 10. Pretraining on an auxiliary task" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this exercise you will build a DNN that compares two MNIST digit images and predicts whether they represent the same digit or not. Then you will reuse the lower layers of this network to train an MNIST classifier using very little training data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 10.1.\n", "Exercise: _Start by building two DNNs (let's call them DNN A and B), both similar to the one you built earlier but without the output layer: each DNN should have five hidden layers of 100 neurons each, He initialization, and ELU activation. Next, add one more hidden layer with 10 units on top of both DNNs. You should use the `keras.layers.concatenate()` function to concatenate the outputs of both DNNs, then feed the result to the hidden layer. Finally, add an output layer with a single neuron using the logistic activation function._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 10.2.\n", "_Exercise: split the MNIST training set in two sets: split #1 should containing 55,000 images, and split #2 should contain contain 5,000 images. Create a function that generates a training batch where each instance is a pair of MNIST images picked from split #1. Half of the training instances should be pairs of images that belong to the same class, while the other half should be images from different classes. For each pair, the training label should be 0 if the images are from the same class, or 1 if they are from different classes._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 10.3.\n", "_Exercise: train the DNN on this training set. For each image pair, you can simultaneously feed the first image to DNN A and the second image to DNN B. The whole network will gradually learn to tell whether two images belong to the same class or not._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 10.4.\n", "_Exercise: now create a new DNN by reusing and freezing the hidden layers of DNN A and adding a softmax output layer on top with 10 neurons. Train this network on split #2 and see if you can achieve high performance despite having only 500 images per class._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "nav_menu": { "height": "360px", "width": "416px" }, "toc": { "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 6, "toc_cell": false, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }