{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training artificial neural networks using tensorflow\n",
    "\n",
    "\n",
    "## Neural Network\n",
    "\n",
    "![Neuronháló](https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Colored_neural_network.svg/300px-Colored_neural_network.svg.png)\n",
    "\n",
    "## MNIST\n",
    "\n",
    "![MNIST adatbázis](http://neuralnetworksanddeeplearning.com/images/mnist_100_digits.png)\n",
    "The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.\n",
    "\n",
    "It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting. \n",
    "\n",
    "More information: http://yann.lecun.com/exdb/mnist/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First let's install matplotlib (if it was not installed yet)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!sudo pip3 install matplotlib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then download the data, and read it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "import tensorflow as tf\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "# Import MNIST data\n",
    "mnist = tf.keras.datasets.mnist\n",
    "\n",
    "(x_train, y_train),(x_test, y_test) = mnist.load_data()\n",
    "#flatten the data for training\n",
    "x_train = x_train.reshape(x_train.shape[0], x_train.shape[1]*x_train.shape[2])\n",
    "x_test = x_test.reshape(x_test.shape[0], x_test.shape[1]*x_test.shape[2])\n",
    "#normalization\n",
    "x_train, x_test = x_train / 255.0, x_test / 255.0\n",
    "\n",
    "print(x_train.shape, y_train.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some hyperparameters for the NN"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parameters\n",
    "learning_rate = 0.1\n",
    "num_steps = 1000\n",
    "batch_size = 128\n",
    "display_step = 100\n",
    "\n",
    "# Network Parameters\n",
    "n_hidden_1 = 256 # 1st layer number of neurons\n",
    "n_hidden_2 = 256 # 2nd layer number of neurons\n",
    "num_input = 784 # MNIST data input (img shape: 28*28)\n",
    "num_classes = 10 # MNIST total classes (0-9 digits)\n",
    "\n",
    "# create the onehot vectors from the labels\n",
    "y_train_one_hot = np.eye(num_classes)[y_train]\n",
    "y_test_one_hot = np.eye(num_classes)[y_test]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To train the network, we create an input function that will generate the batches of data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the input function for training\n",
    "input_fn = tf.estimator.inputs.numpy_input_fn(\n",
    "    x={'images': x_train}, y=y_train,\n",
    "    batch_size=batch_size, num_epochs=None, shuffle=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, define the structure of the network (2 hidden layers)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the neural network\n",
    "def neural_net(x_dict):\n",
    "    # TF Estimator input is a dict, in case of multiple inputs\n",
    "    x = x_dict['images']\n",
    "    # Hidden fully connected layer with 256 neurons\n",
    "    layer_1 = tf.layers.dense(x, n_hidden_1)\n",
    "    # Hidden fully connected layer with 256 neurons\n",
    "    layer_2 = tf.layers.dense(layer_1, n_hidden_2)\n",
    "    # Output fully connected layer with a neuron for each class\n",
    "    out_layer = tf.layers.dense(layer_2, num_classes)\n",
    "    return out_layer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What we still need for training:\n",
    "- handling of the output\n",
    "- loss function\n",
    "- error metric"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the model function (following TF Estimator Template)\n",
    "def model_fn(features, labels, mode):\n",
    "    \n",
    "    # Build the neural network\n",
    "    logits = neural_net(features)\n",
    "    \n",
    "    # Predictions\n",
    "    pred_classes = tf.argmax(logits, axis=1)\n",
    "    pred_probs = tf.nn.softmax(logits)\n",
    "    \n",
    "    # If prediction mode, early return\n",
    "    if mode == tf.estimator.ModeKeys.PREDICT:\n",
    "        return tf.estimator.EstimatorSpec(mode, predictions=pred_classes) \n",
    "        \n",
    "    # Define loss and optimizer\n",
    "    loss_op = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(\n",
    "        logits=logits, labels=tf.cast(labels, dtype=tf.int32)))\n",
    "    optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\n",
    "    train_op = optimizer.minimize(loss_op, global_step=tf.train.get_global_step())\n",
    "    \n",
    "    # Evaluate the accuracy of the model\n",
    "    acc_op = tf.metrics.accuracy(labels=labels, predictions=pred_classes)\n",
    "    \n",
    "    # TF Estimators requires to return a EstimatorSpec, that specify\n",
    "    # the different ops for training, evaluating, ...\n",
    "    estim_specs = tf.estimator.EstimatorSpec(\n",
    "      mode=mode,\n",
    "      predictions=pred_classes,\n",
    "      loss=loss_op,\n",
    "      train_op=train_op,\n",
    "      eval_metric_ops={'accuracy': acc_op})\n",
    "\n",
    "    return estim_specs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After all this, we can create an Estimator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Build the Estimator\n",
    "model = tf.estimator.Estimator(model_fn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training\n",
    "As can be seen, after every 100. steps (one step is the training on one batch) we get a report on how well the net performes. For hyperparameter tuning we need to observe the loss values so we can choose the best learning rate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Train the Model\n",
    "model.train(input_fn, steps=num_steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After training we evaluate the model on the test set\n",
    "**Important: we evaluate only the best model on the test set, otherwise it's peeking.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Evaluate the Model\n",
    "# Define the input function for evaluating\n",
    "input_fn_test = tf.estimator.inputs.numpy_input_fn(\n",
    "    x={'images': x_test}, y=y_test,\n",
    "    batch_size=batch_size, shuffle=False)\n",
    "# Use the Estimator 'evaluate' method\n",
    "model.evaluate(input_fn_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also evaluate on the first n images:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Predict single images\n",
    "n_images = 4\n",
    "# Get images from test set\n",
    "test_images = x_test[:n_images]\n",
    "# Prepare the input data\n",
    "input_fn_test_few = tf.estimator.inputs.numpy_input_fn(\n",
    "    x={'images': test_images}, shuffle=False)\n",
    "# Use the model to predict the images class\n",
    "preds = list(model.predict(input_fn_test_few))\n",
    "\n",
    "# Display\n",
    "for i in range(n_images):\n",
    "    plt.imshow(np.reshape(test_images[i], [28, 28]), cmap='gray')\n",
    "    plt.show()\n",
    "    print(\"Model prediction:\", preds[i])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tasks\n",
    "1. Let's find the images where the net makes a mistake!\n",
    "2. Try to change the number of neurons, steps, and the batchsize!\n",
    "3. Add more hidden layers to the network!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}