Sunday, 12 January 2025

AI for Network Engineers: LSTM-Based RNN


Recap of the Operation of an LSTM Cell

The previous section introduced the construction and operation of a single Long Short-Term Memory (LSTM) cell. This section briefly discusses an LSTM-based Recurrent Neural Network (RNN). Before diving into the details, let’s recap how an individual LSTM cell operates with a theoretical, non-mathematical example.

Suppose we want our model to produce the sentence: “It was cloudy, but it is raining now.” The first part of it refers to the past, and one of the LSTM cells has stored the tense “was” in its internal cell state. However, the last portion of the sentence refers to the present. Naturally, we want the model to forget the previous tense “was” and update its state to reflect the current tense “is.”

The Forget Gate plays a role in discarding unnecessary information. In this case, the forget gate suppresses the word “was” by closing its gate (outputting 0). The Input Gate  is responsible for providing a new candidate cell state, which in this example is the word “is.” The input gate is fully open (outputting 1) to allow the latest information to be introduced.

The Identification function computes the updated cell state by summing the contributions of the forget gate and the input gate. This updated cell state represents the memory for the next time step. Additionally, the updated cell state is passed through an Output Activation function, which provides the cell’s output.

The Output Gate controls how much of this activated output is shared as the public output. In this example, the output gate is fully open (outputting 1), allowing the word “is” to be published as the final output.

An Overview of an LSTM-Based RNN

Figure 6-5 illustrates an LSTM-based RNN model featuring two LSTM layers and a SoftMax layer. The input vectors x1 and x2, along with the cell output ht−1 from the previous time step, are fed into all LSTM cells in the input layer. To keep the figure simple, only two LSTM cells are shown per layer.

The input vectors pass through gates, producing both the internal cell state and the cell output. The internal states are stored using a Constant Error Carousel (CEC) to be utilized in subsequent time steps. The cell output is looped back as an input vector for the next time step. Additionally, the cell output is passed to all LSTM cells in the next layer.

Finally, the SoftMax layer generates the model's output. Note that Figure 6-5 depicts a single time step.


Figure 6-5: LSTM based RNN Layer Model.

Figure 6-6 illustrates a layered LSTM-based Recurrent Neural Network (RNN) model that processes sequential data across four time steps. The model consists of three layers: the input LSTM layer, a hidden LSTM layer, and a SoftMax output layer. Each gray square labeled "LSTM" represents a layer containing n LSTM cells.

At the first time step, the input value x1 is fed to the LSTM cells in the input layer. Each LSTM cell computes its internal cell state (C), applies it to the output activation function, and produces a cell output (ht ). This output is passed both to the LSTM cells in the next time step via recurrent connections and to the LSTM cells in the hidden layer at the same time step as an input vector. 

The LSTM cells in the hidden layer repeat the process performed by the input layer LSTM cells. Their output (ht) is passed to the SoftMax layer, which computes probabilities for each possible output class, generating the model's predictions (y1). The cell output is also passed to the next time step on the same layer.

The figure also depicts the autoregressive mode, where the output of the SoftMax layer at the initial time step t1 is fed back as part of the input for the next time step (t+1) in the input layer. This feedback loop enables the model to use its predictions from previous time steps to inform its processing of subsequent time steps. Autoregressive models are particularly useful in tasks such as sequence generation, where the output sequence depends on previously generated elements.

Key Features Depicted in Figure 6-6

Recurrent Data Flow: The outputs from each time step are recurrently fed into the next time step, capturing temporal dependencies.

Layered Structure: The vertical connections between layers allow the model to hierarchically process input data, with higher layers learning progressively abstract features.

Autoregressive Feedback: The use of SoftMax outputs as part of the next time step’s input highlights the autoregressive nature of the model, commonly used in sequence prediction and generation tasks.

Figure 6-6: LSTM-Based RNN Model with Layered Structure and Four Time Steps.

Conclusion


Figure 6-6 demonstrates the interplay between sequential and layered data flow in a multi-layered LSTM model, showcasing how information is processed both temporally (across time steps) and hierarchically (across layers). The autoregressive feedback loop further illustrates the model’s capability to adapt its predictions based on prior outputs, making it well-suited for tasks such as time series forecasting, natural language processing, and sequence generation.


No comments:

Post a Comment