Deep learning, has recently become a very popular machine learning technique, and has been used to establish state of the art in a problems ranging from natural language processing to image classification. In this post I will introduce deep learning, also known as neural networks. Although this feature is not yet available, I will also show a preview of how to create and deploy such a model in Data Science Studio.

Here we’ll be using the well known MNIST dataset, database of handwritten digits from American Census Bureau employees and American high school students, available online here.

Neural networks are commonly used for pattern recognition in classification problems. They have enjoyed many successes particularly in image classification (e.g. MNIST, ImageNet, Dogs vs Cats).

Neural networks require a fair bit more manual tuning than other machine learning algorithms, but in exchange for this additional difficulty, you can get very powerful models that can tackle hard problems.

These models are called neural networks because they are loosely inspired by biological neurons. They can be represented by a neuron graph, with possibly deep architectures (i.e. tens of hidden layers).

## The MNIST Data

**MNIST** contains 60,000 training images and 10,000 test images. The format for the data is a bit peculiar so we need a **custom python recipe** to read it. In Data Science Studio all data is tabular, so we need to keep in mind that although these are images, i.e. 2 dimensional arrays, the data will still be represented as 1 dimensional, e.g.:

→ 1,2,3,4,5,6,7,8,9

First, let’s download the data from Yann LeCun’s website into a folder. Then let’s create a custom python recipe with 2 datasets as outputs:

**Python code**:

The recipe reads the inputs, and combines the data and labels into two datasets: one for training and one for testing.

Now we can **explore the data**:

The digit column identifies what the handwritten digit is, the other columns identify specific pixels of the image. A better way to visualize is using an iPython notebook.

**Python code**:

Here we can see what the images actually look like, and what we’re **trying to predict**.

## Neural Network Basics

Before we get into training models, lets go over the basics of **neural networks**. The basic building block is a neuron. A neuron takes a weighted sum of inputs, and calculates an activating function.

Neural networks are usually composed of several layers of interconnected neurons. In the first layer, called the input layer, each neuron corresponds to an input feature (in our case, a pixel). The second layer neurons' inputs are the first layer neurons, third layer neurons' inputs are the second layer neurons ... Training a neural network means selecting the best weights for all of the neuron connections. The weights are learned using an algorithm called **backpropagation**.

When designing a neural network, there are a few things you need to choose. First is the architecture of the network: how many layers, what kind of layers, what size. Second is the loss, such as logloss for a classification problem, and optimizer, such as gradient descent. There is a balance between the complexity of the architecture, the data and the computation time necessary to train a model that doesn’t overfit in a reasonable time.

In the examples below, I will describe some of the layers and we will train mutli layer perceptrons and a convolutional neural network.

Today, there are several great open-source libraries that make training neural networks fairly straightforward. Notably, there are FANN and Caffe in C/C++, and several Theano based ones in Python.

We will be using python neural network libraries nolearn, Lasagne and Theano.

Theano allows you to do efficient calculations involving **mutli-dimensional arrays**, which is the basis for training neural networks. Lasagne provides a **lightweight library** to build and train neural networks in Theano, and nolearn provides an easy to use wrapper for Lasagne-based models.

Nolearn neural networks take a list of layers, and certain parameters as input, and create a classifier or regressor compatible with the scikit-learn API.

## Training neural networks

Let’s try to train some **algorithms** and see how they perform. We are **building a prediction model** for the digit column.

Before training, we will need to change a few settings from their default values:

- Learning task: this is a multiclass classification problem, and we want to optimize for accuracy.
- Train & validation: we have 2 datasets for test and train, so we want explicit extracts from 2 datasets (train data and test data)
- All features should be numeric with no rescaling

Let’s start by training a **random forest**, a **logistic regression**, and a simple **neural network**.

The architecture we will choose is a very simple one with 1 hidden layer with 200 hidden units. Check the neural network box and enter the model code:

Visually, the network looks like the diagram above:

but there are 784 input units (corresponding to the 784 pixels in the images), the hidden layer has 200 units, and the output layer has 10 units (1 for each digit). The final prediction consists in looking at which of the output neurons has the highest value.

The **nolearn** neural network object takes several parameters: a list of layers, the parameters for the layers, and hyper parameters for the neural network.

The layers we’re using here are called dense layers: each output is connected to all of the input parameters. The weights of each connection are learned through training. For this simple model, there are 159010 of weights to be learned.

Not bad! The logistic regression has an error rate of 8.1%, the random forest 3.3%, and the neural network 2.5%. Let’s try increasing the complexity of the model by adding one more hidden layer and increasing the number of units:

Change layers to:

and change the parameters:

This slightly more complex neural network has 545810 of weights to learn, and has a 2% error rate.

These two previous models are called **multilayer perceptrons**, the simplest neural network architectures. We could try increasing the number of hidden layers, or the number of units, but we’re going to try something smarter, by using a much deeper architecture called **LeNet** shown below:

The key component is the convolutional layer. These layers exploit the structure of the data - an image. A convolutional layer connects each output to only a few close inputs, as shown in the illustration above. Intuitively, this means the layer will learn local features. The pooling layer then combines nearby inputs. This model has 91190 learnable weights.

In all of the models, the number of learnable weights was higher than the number of training examples, so one could expect the network to start overfitting. In order to mitigate this, we will use dropout: randomly zeroing a fraction of neurons.

To train this model, change the layers to:

The reshape layers changes the shape of the data back into a 2d image for the convolutional layer. For convolutional layers, we need to select how many filters, and the size of the convolution area – too small or too large and we won’t be able to obtain interesting features. The dropout layers dropout rate is also important, to balance learning and overfitting.

And add the following parameters:

Using this network, and a fair bit of training time, we get an error rate of 1%!

That is very impressive. Out of the 10,000 images in the test set, only 100 are wrongly classified. Let’s take a look at some of those misclassified images:

classified as 6

classified as 7

classified as 9

classified as 2

Some of these are definitely mistakes, but to me others are ambiguous!

To get even better results, we could try to further tune the network parameters and architecture. But another important trick is to increase the amount of training data. This can be done either by gathering more from the source, or by transforming the available training data. For some problems, we could apply certain rotations or symmetries to the images. For digits, that would not work, but we can use elastic distortions to increase the training set 100 fold or more.

Here are some examples of **transformed training data**.

## Conclusion

I hope you enjoyed this brief introduction to **Neural Networks** and a preview of how to implement them in Data Science Studio. Stay tuned for further more examples!

Any question about this blog post? Just send me an email and we’ll discuss it :)