An Image Classification using Pytorch


Introduction Image Classification using Pytorch

Numbers of applications of computer vision, image classification is most important. Image Classification is used in one way or the other in all these industries. How do they do it? Which framework do they use? In most cases it is Image Classification using Pytorch

You must have read a lot about the differences between different deep learning frameworks including TensorFlow, PyTorch, Keras, and many more. TensorFlow and PyTorch are undoubtedly the most popular frameworks out of all in the industry. 

In this article, we will understand how to build a basic Image Classification using Pytorch and TensorFlow. And then we will take the benchmark MNIST handwritten digit classification dataset and build an image classification model using CNN (Convolutional Neural Network) in PyTorch and TensorFlow.

You can pick any of the frameworks which you feel comfortable with and start building other computer vision models too.

About Pytorch

PyTorch is gaining popularity in the Deep Learning community and is widely being used by deep learning practitioners, PyTorch is a Python package that provides Tensor computations. Furthermore, tensors are multidimensional arrays just like NumPy’s ndarrays which can run on GPU as well.

The Autograd package of Pytorch builds computation graphs from tensors and automatically computes gradients. 

PyTorch provides a framework for us to build computational graphs as we go, and even change them during runtime. Particularly, this depends upon the  situations where we don’t know how much memory for creating a neural network. You can work on all sorts of deep learning challenges using PyTorch. The following are some of the challenges:

  1. Images (Detection, Classification, etc.)
  2. Text (classification, generation, etc.)
  3. Reinforcement Learning

About TensorFlow

TensorFlow was developed by researchers and engineers from the Google Brain team. It is far away from the most commonly used software library in the field of deep learning.

One of the biggest reasons TensorFlow is so popular is its support for multiple languages to create deep learning models, such as Python, C++, and R. 

There are various components that go into making TensorFlow. The following are the two are most of the used:

  1. TensorBoard: Helps in effective data visualization using data flow graphs
  2. TensorFlow: Useful for rapid deployment of new algorithms/experiments

TensorFlow is currently running version 2.0 which was officially released in September 2019. We will also be implementing CNN in 2.0 version.

Here complete guideline of TF 2.0


The image processing using Pytorch implement on the MNIST data set. This dataset contains handwritten digits of the 10 classes from 0 to 9. The task is we have classified the images of digits. In the MNIST dataset, we have images of digits that were taken from a variety of scanned documents, normalized in size, and centered. Subsequently, each image is a 28 by 28-pixel square (784 pixels total). A standard split of the dataset is used to evaluate and compare models. There are 60,000 images are used to train a model and a separate set of 10,000 images are used to test it.

The above figure shows the sample of the MNIST dataset.

Now, we have understood the dataset, and let’s start to implement image classification using CNN in PyTorch TensorFlow.

Thank You! Google provides a free of cost GPU known as ‘colab’ notebook. Because GPU easily run the deep learning  models. 

I am assuming you understand the Convolutional Neural Networks(CNN), if not please visit CNN article

#put CNN blog link

Let’s Start Code:
Import required library
import numpy as np
import torch
import torchvision
import matplotlib.pyplot as plt
from time import time
from torchvision import datasets, transforms
from torch import nn, optim

Check PyTorch Version:
# version of pytorch
Output:- 1.6.0+cu101

So, I am using the 1.5.1 version of PyTorch. If you are using any other version, you might get a few warnings or errors, so you can update to this version of PyTorch. We will perform some transformations on the images, like normalizing the pixel values.

# transformations to be applied on images
transform = transforms.Compose([transforms.ToTensor(),
                          	transforms.Normalize((0.5,), (0.5,)),

Now, let’s load the training and testing dataset of the MNIST direct from the library.
# defining the training and testing set
trainset = datasets.MNIST('./data', download=True, train=True, transform=transform)
testset = datasets.MNIST('./', download=True, train=False, transform=transform)

Next, I have defined the train and test loader which will help us to load the training and test set in batches. I will define the batch size as 64

# defining trainloader and testloader
trainloader =, batch_size=64, shuffle=True)
testloader =, batch_size=64, shuffle=True)

Let’s look at the summary of the training set first:
# shape of training data
dataiter = iter(trainloader)
images, labels =

torch.Size([64, 1, 28, 28])
# defining the model architecture
class Net(nn.Module):  
 def __init__(self):
     super(Net, self).__init__()
     self.cnn_layers = nn.Sequential(
         # Defining a 2D convolution layer
         nn.Conv2d(1, 4, kernel_size=3, stride=1, padding=1),
         nn.MaxPool2d(kernel_size=2, stride=2),
         # Defining another 2D convolution layer
         nn.Conv2d(4, 4, kernel_size=3, stride=1, padding=1),
         nn.MaxPool2d(kernel_size=2, stride=2),
     self.linear_layers = nn.Sequential(
         nn.Linear(4 * 7 * 7, 10)
 # Defining the forward pass   
 def forward(self, x):
     x = self.cnn_layers(x)
     x = x.view(x.size(0), -1)
     x = self.linear_layers(x)
     return x
# defining the model
model = Net()
# defining the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.01)
# defining the loss function
criterion = nn.CrossEntropyLoss()
# checking if GPU is available
if torch.cuda.is_available():
   model = model.cuda()
   criterion = criterion.cuda()


/usr/local/lib/python3.6/dist-packages/torch/cuda/ UserWarning: 

Tesla T4 with CUDA capability sm_75 is not compatible with the current PyTorch installation.

The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.

If you want to use the Tesla T4 GPU with PyTorch, please check the instructions at

warnings.warn(incompatible_device_warn.format(device_name, capability, ” “.join(arch_list), device_name))


  (cnn_layers): Sequential(

    (0): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

    (1): BatchNorm2d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

    (2): ReLU(inplace=True)

    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

    (4): Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

    (5): BatchNorm2d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

    (6): ReLU(inplace=True)

    (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)


  (linear_layers): Sequential(

    (0): Linear(in_features=196, out_features=10, bias=True)



for i in range(10):
   running_loss = 0
   for images, labels in trainloader:
       if torch.cuda.is_available():
         images = images.cuda()
         labels = labels.cuda()
       # Training pass
       output = model(images)
       loss = criterion(output, labels)
       #This is where the model learns by backpropagating
       #And optimizes its weights here
       running_loss += loss.item()
       print("Epoch {} - Training loss: {}".format(i+1, running_loss/len(trainloader)))
Epoch 1 - Training loss: 0.17207529680378464
Epoch 2 - Training loss: 0.09385816647019833
Epoch 3 - Training loss: 0.08175089742034189
Epoch 4 - Training loss: 0.0757171266008792
Epoch 5 - Training loss: 0.07225981012243889
Epoch 6 - Training loss: 0.06886185119241309
Epoch 7 - Training loss: 0.0680735878681621
Epoch 8 - Training loss: 0.06541035992531741
Epoch 9 - Training loss: 0.06356726545632742
Epoch 10 - Training loss: 0.06090876984291835

You can see that the training is decreasing with an increasing number of epochs. This means that our model is learning patterns from the training set. Let’s check the performance of this model on the test set:

# getting predictions on test set and measuring the performance
correct_count, all_count = 0, 0
for images,labels in testloader:
 for i in range(len(labels)):
   if torch.cuda.is_available():
       images = images.cuda()
       labels = labels.cuda()
   img = images[i].view(1, 1, 28, 28)
   with torch.no_grad():
       logps = model(img)
   ps = torch.exp(logps)
   probab = list(ps.cpu()[0])
   pred_label = probab.index(max(probab))
   true_label = labels.cpu()[i]
   if(true_label == pred_label):
     correct_count += 1
   all_count += 1
print("Number Of Images Tested =", all_count)
print("\nModel Accuracy =", (correct_count/all_count))
Output: Number Of Images Tested = 10000
Model Accuracy = 0.965

So, we tested a total of 10000 images and the model is around 96% accurate in predicting the labels for test images. This is how you can build a Convolutional Neural Network in PyTorch. In the next section, we will look at how to implement the same architecture in TensorFlow.

Implementing CNN in TensorFlow.
# importing the libraries
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt
# version of tensorflow

Output 2.2.0

So, we are using the 2.2.0 version of TensorFlow. Let’s now load the MNIST dataset using the datasets class of tensorflow.keras:

(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data(path='mnist.npz')
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0


Downloading data from

11493376/11490434 [==============================] – 0s 0us/step

Here, we have loaded the training as well as the test set of the MNIST dataset. Also, we have normalized the pixel values for both training as well as test images. Next, let’s visualize a few images from the dataset:

# visualizing a few images
for i in range(9):
   plt.imshow(train_images[i], cmap='gray')

# shape of the training and test set
(train_images.shape, train_labels.shape), (test_images.shape, test_labels.shape)

# reshaping the images
train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))
O/P: (((60000, 28, 28), (60000,)), ((10000, 28, 28), (10000,)))
So, we have 60,000 images of shape 28 by 28 in the training set and 10,000 images of the same shape in the test set. Next, we will resize the shape of images and one-hot encode the target variable:
# one hot encoding the target variable
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

Define Model Architecture

Now, we will define the architecture of our model. We will use the same architecture which we defined in PyTorch. So, our model will have 2 convolutional layers, with a combination of max-pooling layers, then we will have a flatten layer and finally a dense layer with 10 neurons since we have 10 classes.

# defining the model architecture
model = models.Sequential()
model.add(layers.Conv2D(4, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2), strides=2))
model.add(layers.Conv2D(4, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2), strides=2))
model.add(layers.Dense(10, activation='softmax'))

# summary of the model
Model: "sequential"
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 4)         40        
max_pooling2d (MaxPooling2D) (None, 13, 13, 4)         0         
conv2d_1 (Conv2D)            (None, 11, 11, 4)         148       
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 4)           0         
flatten (Flatten)            (None, 100)               0         
dense (Dense)                (None, 10)                1010      
Total params: 1,198
Trainable params: 1,198
Non-trainable params: 0

To summarize, we have 2 convolutional layers, 2 max-pooling layers, a flatten layer, and a dense layer. The total number of parameters in the model is 1,198. Now that our model is ready, we will compile it:

# compiling the model
We are using Adam optimizer, and you can change it as well. The loss function is set to be as categorical cross-entropy since we are solving a multi-class classification problem and the metric is accuracy. Now let’s train our model for 10 epochs:
# training the model
history =, train_labels, epochs=10, validation_data=(test_images, test_labels))
Epoch 1/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.4587 - accuracy: 0.8612 - val_loss: 0.2001 - val_accuracy: 0.9412
Epoch 2/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1850 - accuracy: 0.9440 - val_loss: 0.1378 - val_accuracy: 0.9617
Epoch 3/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1433 - accuracy: 0.9566 - val_loss: 0.1142 - val_accuracy: 0.9674
Epoch 4/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1215 - accuracy: 0.9631 - val_loss: 0.1040 - val_accuracy: 0.9699
Epoch 5/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1079 - accuracy: 0.9674 - val_loss: 0.0901 - val_accuracy: 0.9739
Epoch 6/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0981 - accuracy: 0.9705 - val_loss: 0.0792 - val_accuracy: 0.9765
Epoch 7/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0902 - accuracy: 0.9724 - val_loss: 0.0772 - val_accuracy: 0.9762
Epoch 8/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0861 - accuracy: 0.9728 - val_loss: 0.0770 - val_accuracy: 0.9771
Epoch 9/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0818 - accuracy: 0.9749 - val_loss: 0.0738 - val_accuracy: 0.9779
Epoch 10/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0789 - accuracy: 0.9754 - val_loss: 0.0678 - val_accuracy: 0.9792


In this article understand the PyTorch and TensorFlow function. Then Short description of the MNIST dataset. 

Apply the image classification on the MNIST dataset using CNN in PyTorch and TensorFlow.

Now, I hope you will be understand both frameworks.  As a next article/tutorial, take another image classification challenge and try to solve it using both PyTorch and TensorFlow.



Please enter your comment!
Please enter your name here