Machine Learning (ML) is a subset of Artificial Intelligence (AI). ML is based on algorithms that allow learning, predicting, and making decisions based on data rather than pre-programmed tasks. ML leverages Deep Neural Networks (DNNs), which have multiple layers, each consisting of neurons that process information from sub-layers as part of the training process. Large Language Models (LLMs), such as OpenAI’s GPT (Generative Pre-trained Transformers), utilize ML and Deep Neural Networks.
For network engineers, it is crucial to understand the fundamental operations and communication models used in ML training processes. To emphasize the importance of this, I quote the Chinese philosopher and strategist Sun Tzu, who lived around 600 BCE, from his work The Art of War.
If you know the enemy and know yourself, you need not fear the result of a hundred battles.
We don’t have to be data scientists to design a network for AI/ML, but we must understand the operational fundamentals and communication patterns of ML. Additionally, we must have a deep understanding of network solutions and technologies to build a lossless and cost-effective network for enabling efficient training processes.
In the upcoming two posts, I will explain the basics of:
a) Data Models: Layers and neurons, forward and backward passes, and algorithms.
b) Parallelization Strategies: How training times can be reduced by dividing the model into smaller entities, batches, and even micro-batches, which are processed by several GPUs simultaneously.
The number of parameters, the selected data model, and the parallelization strategy affect the network traffic that crosses the data center switch fabric.
After these two posts, we will be ready to jump into the network part.
At this stage, you may need to read (or re-read) my previous post about Remote Direct Memory Access (RDMA), a solution that enables GPUs to write data from local memory to remote GPUs' memory.
Toni, what app are you using to draw these awesome diagrams?
ReplyDeleteI am using good old Microsoft Powerpoint
ReplyDelete