Convolution Neural Network for image classification

Convolution Neural Network for image classification#

Convolutional Neural Network - CNN#

CNNs are one type of ANN which utilize the neuron, kernel, activation function.
Inputs must be in images (or assumed to be images)
Using Forward & Backpropagation technique with certain property to process it faster
CNNs best for object detection, image classification, computer vision

Architecture of CNNs#

Source

A basic CNNs consists of Convolution Layers, Max Pooling Layers and fully connected Layer (Dense) before output layer
A simple image can be simply flatten into 1D vector and driven through the regular fully connected NN. However, this requires lot of computational power if the image is large and has more color.
Therefore, Convolution Layers and Max Pooling

Convolutional Layer (CNN or ConvNet)#

Take a look at the simple gray scale image below which contains 10 pixels on width & height. The color scale has only 2 values (black & white) or (binary -1 and 1), there fore the size of the following image is 10x10x1:

However, regular image contains colors RGB with each color scale ranges from 0-255, making the size of each image is: n x n x 3 (n = number of pixel).

CNN uses the Convolved Feature to reduce the image size by dot product with given kernel.
The image reduction without losing features and easier to process for good prediction

So for 3 channel RGB colors, the image size have been reduced:

In other word, the convoluted image from RGB image would look like:

Pooling Layer#

Similar to the Convolutional Layer, the Pooling layer is responsible for reducing the spatial size of the Convolved Feature.
This is to decrease the computational power required to process the data through dimensionality reduction
Two types of Pooling: Max Pooling & Average Pooling.

In which Max Pooling performs a lot better than Average Pooling.

The image after Max Pooling layer would look like:

Flatten Layer#

Once the images have passed through Convolution Layer and Pooling Layer, its size has been reduced greatly and ready for MLP training (or to another Convolution steps).
The image is then flatten to a column vector and passed through feed-forward NN and BackPropagation applied to every iteration.
Softmax activation function is applied to classified the multi-output

More information can be found here

Application of CNN in image classification#

The CIFAR10 database#

The CIFAR10 database consisting 60,000 color images with 10 different classes
Each image has 32 x 32 pixels with color range from 0-255
It is good database for pattern recognition and image classification task (the entire data is clean and ready for use).
The dataset was divided into 50,000 images for training and 10,000 images for testing
The 10 classes are airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
Sample CIFAR10 data:

Source

Importing libraries#

import keras
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical

Import convolution, max pooling and flatten as mentioned above:#

from tensorflow.keras.layers import Conv2D # convolutional layers to reduce image size
from tensorflow.keras.layers import MaxPooling2D # Max pooling layers to further reduce image size
from tensorflow.keras.layers import Flatten # flatten data from 2D to column for Dense layer

Load CIFAR10 data#

from tensorflow.keras.datasets import cifar10

# load data
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

# Normalized data to range (0, 1):
X_train, X_test = X_train/255, X_test/255
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

Sample ploting:

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

import matplotlib.pyplot as plt

plt.figure(figsize=(10,10))
for i in range(49):
    plt.subplot(7,7,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(X_train[i])
    # The CIFAR labels happen to be arrays, which is why you need the extra index    
    plt.xlabel(class_names[y_train[i][0]])
plt.show()

Using One Hot Encoding from Keras to convert the label:

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
print(y_train.shape)
print(y_test.shape)

Construct Convolutional Neural Network#

For Convolution front end, starting with kernel size (3,3) with a number of filter 10 followed by Max Pooling Layer with pool_size = (2,2).
The 2D data after two Max Pooling layer is flatten directly.

model = Sequential()
model.add(Conv2D(8, (3, 3), strides=(1, 1), activation='relu', input_shape=(32, 32, 3)))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

model.add(Flatten())
model.add(Dense(100, activation='relu'))
#Output layer contains 10 different number from 0-9
model.add(Dense(10, activation='relu'))
model.summary()

# compile model
model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),  metrics=['accuracy'])                                                       

Train model#

Fit the model

# fit the model
model_CNN = model.fit(X_train, y_train, epochs=10, 
                    validation_data=(X_test, y_test))

Evaluate the output#

Visualize the training/testing accuracy:

fig = plt.figure(figsize=(8, 10), dpi=80)
plt.subplot(2,1,1)
plt.plot(model_CNN.history['accuracy'],"b-o")
plt.plot(model_CNN.history['val_accuracy'],"r-d")
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.legend(['train', 'test'])

plt.subplot(2,1,2)
plt.plot(model_CNN.history['loss'],"b-o")
plt.plot(model_CNN.history['val_loss'],"r-d")
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'])
plt.tight_layout()
fig

Save & reload CNN model#

Save model:

model.save('CNN_CIFAR10.keras')

Reload model:

model_new = tf.keras.models.load_model('CNN_CIFAR10.keras')

Evaluate model with testing data#

test_loss, test_accuracy = model_new.evaluate(X_test, y_test, batch_size=64)
print('Test loss: %.4f accuracy: %.4f' % (test_loss, test_accuracy))

157/157 [==============================] - 0s 2ms/step - loss: 1.1538 - accuracy: 0.6080
Test loss: 1.1538 accuracy: 0.6080

The accuracy rate is 0.6080 for testing data means there are 6080 right classification based on 10,000 sample of testing data

Visualize the output with the first 25 testing images#

predictions = model_new.predict(X_test)
ypreds = np.argmax(predictions, axis=1)

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(X_test[i])
    plt.xlabel(class_names[ypreds[i]])
plt.show()

Improving the performance?#

Use more convolution and max pooling payer:

model = Sequential()
model.add(Conv2D(8, (3, 3), strides=(1, 1), activation='relu', input_shape=(32, 32, 3)))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))

model.add(Flatten())
model.add(Dense(100, activation='relu'))
#Output layer contains 10 different number from 0-9
model.add(Dense(10, activation='relu'))

# compile model
model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),  metrics=['accuracy'])     
model.fit(X_train, y_train, epochs=10, 
                    validation_data=(X_test, y_test))                   

predictions = model.predict(X_test)
ypreds = np.argmax(predictions, axis=1)

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(X_test[i])
    plt.xlabel(class_names[ypreds[i]])
plt.show()