In Deep learning, Convolutional Neural Network (CNN or ConvNet) is a part of deep learning that is commonly used for analyzing images. Convolutional networks were inspired by biological processes in which the connection between neurons resembles the organization of the animal visual cortex.
In this tutorial we are going to work on image classification problem as major problem for image processing.
CNN is a sequence of layers and each layer is used to transform one part of activation to another through some differentiable function f(x).
We use three main layers to build a CNN architecture:
CNN architecture is shown below.
Convolutional layer is a core part of CNN architecture. As other layers in deep learning, convolutional layers apply some operations to the input, passing the result to the next layer. Each neuron will process data only for his receptive field.
Receptive field is a particular region of that fixed neuron in which stimulation will modify the activation of neuron. Dimension of receptive field is fixed.
Let’s give image with dimensions nxn. If we want to make convolutional operation on this image, choose position of one pixel, let’s say x and y.
Now for that pixel (x,y) a rectangle with dimension kxk with its center at point (x,y) is chosen. The output will be scalar multiplication of these two matrices.
The process continues by moving our rectangle right ((x+1,y) or (x, y+1)).
In the example below we choose point (1,1) with fixed receptive field 3X3.
In terminology scalar multiplication is replaced with convolutional multiplication.
The output of these operation is called activation map and it is ready for input in next layer of neural network.
Generally, if we have nxnmatrix and if we convolutionally multiply with matrix dimension kxk, we will have matrix with dimension (n-k+1)x(n-k+1). In this example we have stride=1(explained later), but if we choose stride s>1 our output matrix will have dimension:
Pooling layer is used for reducing spatial dimension in network. It is not used for depth reduction. Usages of this layers are:
– By less spatial information we have less computation
– Having less parameters, we got less chance for overfitting
Let’s say we have image with parameters: (h1, w1, d1) where h1 represents height, w1 weight and d1 depth of image. Dimension of max-pooling matrix is
(k, k) with stride s (number that controls the amount of pixels that the image slides). In each box (k, k) we are taking the part of image (starting from left-top corner) that is maximum of the numbers. So, the output dimension (h2, w2, d2) is:
Next image demonstrates the part above with h1=4, w1=4, k=2 and S=2.
Fully-Connected layer is the last part of our neural network. In this case neurons have connections to all activations mentioned in the previous layers. The output will be n dimensional vector where n represents the number of classes/labels.
If we want to classify letters from A to Z (English alphabet), fully-connected layer will produce 26-dimensional vector X where X[i] represents a probability of letter i.
The output of this layer will be the maximum element in this vector.
Let’s look an example of Convolutional network with well known image classification problem.
Image classification problem
In image classification problem we accept the probabilities of classes that best fits an image.
It’s not an easy task for a computer, but why?
Computers see pictures in a different way than we do. What computers see is just array of numbers, usually the numbers between 0 and 255(RGB).
The goal for this tutorial is to give computer an array of numbers (image above) and to get probabilities of each class we gave at the beginning.
For this tutorial we will use cifar10 dataset from Keras. Our goal is to classify images using this dataset. Cifar10 dataset can be found in keras.datasets library.
Our model will train on 10 different classes listed below:
NUMBER_OF_CLASES = 10 CLASES = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
To get data from cifar10, we will use Keras function load_data() to get:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
Shapes of these parameters have 4 values. First is number of samples and the rest of three represents the shape of image (width, height, dept).
To convert class vectors into binary class matrices, you should use keras.utils function:
y_train = to_categorical(y_train, NUMBER_OF_CLASES) y_test = to_categorical(y_test, NUMBER_OF_CLASES)
Now, the main part of this project is how to design our network. With everything we said above our architecture looks like this:
model = Sequential() model.add(Conv2D(32, (3, 3), padding='same', input_shape=x_train.shape[1:])) model.add(Activation('relu')) model.add(Conv2D(64, (3, 3))) model.add(Activation('relu')) model.add(MaxPool2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Conv2D(128, (3, 3), padding='same')) model.add(Activation('relu')) model.add(MaxPool2D(pool_size=(2, 2))) model.add(Conv2D(128, (3, 3))) model.add(Activation('relu')) model.add(MaxPool2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(1024)) model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Dense(NUMBER_OF_CLASES)) model.add(Activation('softmax'))
Let’s explain some parts of this code.
We used a softmax activation at the last layer of our network. Why?
Well, as an output of our network we want to have probabilities of each class. The best way to do that is to keep numbers between 0 and 1. That is why we are using Softmax activation function f(x) that maps x into number less than 1, bigger than 0.
To make this clear, let say (x1,x2,x3,…,x10)is our input. Output will be (y1,y2,y3,…,y10), so for all j1,2,3,…,10.
So, from equation above, we have and
Let’s move to another activation function used in this code. Just to remind, activation function in deep learning are used to ‘modify’ input and will not change dimension of it. If we want to cancel out negative numbers on input activation function RELU f(x)=max(0,x) will do that for us.
Flatten is used for flattening a feature matrix.
Dense layer is actually the fully connected layer which is described above. After each dense layer we have some of activation functions (in the example above, one is relu and second one is softmax for computing final results).
Dropout layer is technique used to improve over-fit network. Some deep learning models use Dropout on the fully connected layers. It is also possible to use dropout after the max-pooling layers creating some kind of image noise augmentation. Take a look at the image below that represents difference of using and not using dropout layer in neural network
To fit our model just type following command:
model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.0001, decay=1e-6), metrics=['accuracy'])
For this network we used categorical_crossentropy as a loss. To explain this type of loss function, suppose we have M classes (M>2). Now, to get loss function we will separately calculate loss of each class label per observation and finally sum the results. The formula is shown below:
– M represents the number of classes
– log is mathematical logarithm function
– y binary indicator (0 or 1) if class c is correct classification for o
– p represents probability of o corresponding of class c
The next image shows how trained phase is done via CNN.
model.fit(x_train/255, y_train, batch_size=128, shuffle=True, epochs=50, validation_data=(x_test/255, y_test), callbacks=[EarlyStopping(min_delta=0.001, patience=3) ])
For this type of network, we used only 50 epochs, just to experimentally show how this network actually works.
Finally, the last part of this project is evaluation of model. This process is done by:
score = model.evaluate(x_test/255.0, y_test, verbose=1)
When you start the model, you should see something like this:
At the end of this training I got accuracy near 77 % with lost 0.68.
For this Keras tutorial, we explained the main parts of CNN architecture. Also, with using image classification problem we implemented the usage of network.
For this type of problems, it is necessary to deeply understand the whole background process.
I hope this tutorial can be helpful and useful to anyone who wants to understand image processing and some of the methods used in the configuration of the neural network itself.
Berin Spahović is a software developer in BPUE and a member of AIML team. His love for AI and ML technologies leads him to constant research and advancement. Berin enjoys competition, research approach to work and working with extraordinary projects with the opportunity to expand knowledge of AI, programming, mathematics and statistics. Also, he is one of the few talents at the Faculty of Natural Sciences and Mathematics in Sarajevo and instructor of competitive mathematics in Gymnasia.