Saturday, 10 August 2024

AI/ML Networking: Part-IV: Convolutional Neural Network (CNN) Introduction

Feed-forward Neural Networks are suitable for simple tasks like basic time series prediction without long-term relationships. However, FNNs is not a one-size-fits-all solution. For instance, digital image training process uses pixel values of image as input data. Consider training a model to recognize a high resolution (600 dpi), 3.937 x 3.937 inches digital RGB (red, green, blue) image. The number of input parameters can be calculated as follows:

Width: 3.937 in x 600 ≈ 2362 pixels
Height: 3.937 in x 600 ≈ 2362 pixels
Pixels in image: 2362 x 2362 = 5,579,044 pixels
RGB (3 channels): 5,579,044 pxls x 3 channels = 16 737 132
Total input parameters: 16 737 132
Memory consumption: ≈ 16 MB

FNNs are not ideal for digital image training. If we use FNN for training in our example, we fed 16,737,132 input parameters to the first hidden layer, each having unique weight. For image training, there might be thousands of images, handling millions of parameters demands significant computation cycles and is a memory-intensive process. Besides, FNNs treat each pixel as an independent unit. Therefore, FNN algorithm does not understand dependencies between pixels and cannot recognize the same image if it shifts within the frame. Besides, FNN does not detect edges and other crucial details. 

A better model for training digital images is Convolutional Neural Networks (CNNs). Unlike in FFN neural networks where each neuron has a unique set of weights, CNNs use the same set of weights (Kernel/Filter) across different regions of the image, which reduces the number of parameters. Besides, CNN algorithm understands the pixel dependencies and can recognize patterns and objects regardless of their position in the image. 

The input data processing in CNNs is hierarchical. The first layer, convolutional layers, focuses on low-level features such as textures and edges. The second layer, pooling layer, captures higher-level features like shapes and objects. These two layers significantly reduce the input data parameters before they are fed into the neurons in the first hidden layer, the fully connected layer, where each neuron has unique weights (like FNNs).