Difference between revisions of "Long Short-Term Memory"

From Psyc 40 Wiki
Jump to: navigation, search
(Long Short-Term Memory (LSTM) refers to a type of recurrent neural network architecture useful for performing classification and regression tasks in long sequence or time-series data.)
Line 1: Line 1:
 
By Alphonso Bradham
 
By Alphonso Bradham
  
```Note: This page is currently a work in progress```
+
'''Note: This page is currently a work in progress'''
  
 
Long Short-Term Memory (LSTM) refers to a type of recurrent neural network architecture useful for performing classification and regression tasks in long sequence or time-series data. LSTMs were developed to counter the [vanishing gradient problem], and the key features of an LSTM network are the inclusion of LSTM "cell state" vectors that allow them to keep track of long range relationships in data that other models would "forget".
 
Long Short-Term Memory (LSTM) refers to a type of recurrent neural network architecture useful for performing classification and regression tasks in long sequence or time-series data. LSTMs were developed to counter the [vanishing gradient problem], and the key features of an LSTM network are the inclusion of LSTM "cell state" vectors that allow them to keep track of long range relationships in data that other models would "forget".
  
==Motivation==
+
==Background on Sequence Data and RNNs==
LSTMs were developed to
+
For data where sequence order is important, traditional feed-forward neural networks often struggle to encode the temporal relationships between inputs. While fantastic at simple regression and classification tasks on independent samples of data, the simple matrix-vector architecture of feed-forward neural networks does not retain a capacity for "memory" and is unable to accurately handle sequence data (where the output of the network on datapoint ''t'' is dependent on the value of data point ''t-1'').
 +
 
 +
It is in these situations (where a sequential ''memory'' is desired) that Recurrent Neural Networks (RNNs) are employed. Put simply, recurrent neural networks differ from traditional feed-forward neural networks in that neurons have connections that point "backwards" to reference earlier observations. Operationally, this translates to RNNs having two inputs at each layer, one that references only the input data at the current time "t" in the traditional manner of feed-forward neural networks, and another that references the network's output at the previous time step "t-1". In this way, the input representing the previous time-step "t-1" serves as the network's "memory" (since the previous output itself also depended on the outputs of the time-steps before it, and so on to the beginning of the sequence).
 +
 
 +
 
 +
==Vanishing Gradient Problem==
 +
As mentioned above, although RNNs have a capacity for memory, their learning is still defined by the back-propagation learning rule present in feed-forward neural networks. One of the side effects of the back-propagation algorithm is its tendency to "diminish" the amount of learning as the

Revision as of 23:39, 21 October 2022

By Alphonso Bradham

Note: This page is currently a work in progress

Long Short-Term Memory (LSTM) refers to a type of recurrent neural network architecture useful for performing classification and regression tasks in long sequence or time-series data. LSTMs were developed to counter the [vanishing gradient problem], and the key features of an LSTM network are the inclusion of LSTM "cell state" vectors that allow them to keep track of long range relationships in data that other models would "forget".

Background on Sequence Data and RNNs

For data where sequence order is important, traditional feed-forward neural networks often struggle to encode the temporal relationships between inputs. While fantastic at simple regression and classification tasks on independent samples of data, the simple matrix-vector architecture of feed-forward neural networks does not retain a capacity for "memory" and is unable to accurately handle sequence data (where the output of the network on datapoint t is dependent on the value of data point t-1).

It is in these situations (where a sequential memory is desired) that Recurrent Neural Networks (RNNs) are employed. Put simply, recurrent neural networks differ from traditional feed-forward neural networks in that neurons have connections that point "backwards" to reference earlier observations. Operationally, this translates to RNNs having two inputs at each layer, one that references only the input data at the current time "t" in the traditional manner of feed-forward neural networks, and another that references the network's output at the previous time step "t-1". In this way, the input representing the previous time-step "t-1" serves as the network's "memory" (since the previous output itself also depended on the outputs of the time-steps before it, and so on to the beginning of the sequence).


Vanishing Gradient Problem

As mentioned above, although RNNs have a capacity for memory, their learning is still defined by the back-propagation learning rule present in feed-forward neural networks. One of the side effects of the back-propagation algorithm is its tendency to "diminish" the amount of learning as the