Recurrent Neural Network & LSTM

Neural networks are different kinds of functions that define one kind of variable to another kind of variable like in classification problems we convert vectors to vectors and in regression problems we convert vectors to scalar. Neural networks are multi-layer networks of neurons that we use to classify things, make predictions, etc.

Recurrent Neural Network (RNN) is a type of Neural Network where the output from the previous step is fed as input to the current step. In traditional neural networks, all the inputs and outputs are independent of each other, but in cases like when it is required to predict the next word of a sentence, the previous words are required and hence there is a need to remember the previous words. Thus RNN came into existence, which solved this issue with the help of a Hidden Layer. The main and most important feature of RNN is Hidden state, which remembers some information about a sequence.

Fig. Recurrent Neural Network

Recurrent Neural nets just throw sequence into mix and there are some architectures that can be used in various applications like Vector to sequence model, sequence to vector model and sequence to sequence model.

  • Vector to Sequence Model:- In vector to sequence model the input is taken as vector and output is generated as sequence of described length. Example Image captioning.
  • Sequence to Vector Model:- In sequence to vector model the input is taken as a sequence of word and output is generated as vector. Example sentiment analysis.
  • Sequence to Sequence Model:- In sequence to sequence model both input and output are sequence but size of input and output must be the same. But most applications out there don’t have equal input and output like in language translation. To deal with this problem we need another architecture i.e. Encoder decoder architecture.
  • Encoder Decoder Model:- Basically encoder decoder model is betterment of sequence to sequence model. In this model encoder and decoder are used. Encoder encodes the input sequence into vectors and decoder decodes output vectors to sequence.

LSTM was introduced by Hochreiter & Schmidhuber in 1997. The main reason to introduce LSTM was to deal with the problem of vanishing and exploding gradient.

LSTM is a kind of time Recursive Network Nerve (RNN), which is suitable for processing and predicting important events with relatively long intervals and delays in time series. LSTM is different from RNN mainly because it adds a processor to the algorithm to judge whether the information is useful or not. The structure under which the processor is called a cell.

A cell contains three gates, respectively called input gate, forget gate and output gate. When a piece of information enters the LSTM’s network, the functions can be used to determine whether it is useful. Only the information that conforms to algorithm authentication can be left behind, inconsistent information is forgotten through the forget gate. It can solve the long — term problem of neural networks under repeated operation.

Fig. Structure of LSTM

The three gates functions are shown below:

ft =(Wf .[ht-1,xt ]+bf) …..(1)

it =(Wi .[ht-1 ,xt]+bi) …..(2)

Ct = tanh(Wc .[ht-1, xt ] +bc) …..(3)

ot =(Wo .[ht-1,xt]+bo) …..(4)

ht = ot .tanh(Ct) …..(5)

Long range dependencies and temporal sequence of LSTM RNN model is more accurate than conventional RNN.

Data Science Enthusiasts