Sunday 15 September 2024

Artificial Intelligence for Network Engineers: Introduction

Several books on artificial intelligence (AI) and deep learning (DL) have been published over the past decade. However, I have yet to find a book that explains deep learning from a networking perspective while providing a solid introduction to DL. My goal is to fill this gap by writing a book titled AI for Network Engineers (note that the title name may change during the writing process). Writing about such a complex subject will take time, but I hope to complete and release it within a year.

Part I: Deep Learning and Deep Neural Networks

The first part of the book covers the theory behind Deep Learning. It begins by explaining the construct of a single artificial neuron and its functionality. Then, it explores various Deep Neural Network models, such as Feedforward Neural Networks (FNN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN). Next, the first part discusses data and model parallelization strategies such as Data, Pipeline, and Tensor Parallelism, explaining how input data and/or model sizes that exceed the memory capacity of GPUs within a single server can be distributed across multiple GPU servers.

Part II: AI Data Center Networking - Lossless Ethernet

After a brief introduction of RoCEv2, the second part continues from part one by explaining how parallelization strategies affect network utilization. It then discusses the Data Center Quantized Congestion Notification (DCQCN) scheme for RoCEv2, introducing key concepts such as Explicit Congestion Notification (ECN) and Priority-based Flow Control (PFC). In addition to ECN and PFC, this section covers other congestion-avoidance methods, such as packet spraying and deep buffers. The second part also delves into AI data center design choices focusing on the East-West backend network. It introduces Rail, Top-of-Rack (ToR), and Rail-Optimized designs.



Figure 1:
Book Introduction.

AI for Network Engineers: Chapter 1 - Deep Learning Basics

Content

Introduction 
Artificial Neuron 
  Weighted Sum for Pre-Activation Value 
  ReLU Activation Function for Post-Activation 
  Bias Term
  S-Shaped Functions – TANH and SIGMOID
Network Impact
Summary
References

Introduction


Artificial Intelligence (AI) is a broad term for solutions that aim to mimic the functions of the human brain. Machine Learning (ML), in turn, is a subset of AI, suitable for tasks like simple pattern recognition and prediction. Deep Learning (DL), the focus of this section, is a subset of ML that leverages algorithms to extract meaningful patterns from data. Unlike ML, DL does not necessarily require human intervention, such as providing structured, labeled datasets (e.g., 1,000 bird images labeled as “bird” and 1,000 cat images labeled as “cat”). 


DL utilizes layered, hierarchical Deep Neural Networks (DNNs), where hidden and output layers consist of computational units, artificial neurons, which individually process input data. The nodes in the input layer pass the input data to the first hidden layer without performing any computations, which is why they are not considered neurons or computational units. Each neuron calculates a pre-activation value (z) based on the input received from the previous layer and then applies an activation function to this value, producing a post-activation output (ลท) value. There are various DNN models, such as Feed-Forward Neural Networks (FNN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN), each designed for different use cases. For example, FNNs are suitable for simple, structured tasks like handwritten digit recognition using the MNIST dataset [1], CNNs are effective for larger image recognition tasks such as with the CIFAR-10 dataset [2], and RNNs are commonly used for time-series forecasting, like predicting future sales based on historical sales data. 


To provide accurate predictions based on input data, neural networks are trained using labeled datasets. The MNIST (Modified National Institute of Standards and Technology) dataset [1] contains 60,000 training and 10,000 test images of handwritten digits (grayscale, 28x28 pixels). The CIFAR-10 [2] dataset consists of 60,000 color images (32x32 pixels), with 50,000 training images and 10,000 test images, divided into 10 classes. The CIFAR-100 dataset [3], as the name implies, has 100 image classes, with each class containing 600 images (500 training and 100 test images per class). Once the test results reach the desired level, the neural network can be deployed to production.


Figure 1-1: Deep Learning Introduction.