RNN is popularly used to train on sequential data, sequential may be a sequence of words i.e sentences, sequential may be time series data or sequential may sequence of pixels.

Problem with MLP for sequential data:

Suppose we have a data-set having total number of words of 1000 and lets a pick first data point i.e This article looks good.

Now this sentence must be featurized, lets featurize into one hot encoding, we’ll be having a vector of size 1000(since total number of words is 1000) for each word.

RNN Ex

We shall pass this vectors to an MLP, note that our first input(our ex) is having four vectors each of size 1k and only this four vectors will be input to the MLP, so the total size for input layer will be 1k x 4 -> 4k

RNN Ex

Ok we have build a MLP for our example sentence but, will the size of input be the same for all the sentences? no, right? sizes of the input sentences will vary sentence so how can we fix this? One way is using padding i.e fixing size of input layer with maximum size of the sentences in our data-set. Suppose let’s think maximum length of sentence is 10 so our input layer size will be 10 x 1k i.e 10k. For each input this 10k size vector is filled with vectors of each sentence and remaining will be filled with zero. But assume that next layer is having 20 activation units so our trainable weights will be 10k(input layer) X 20 i.e 20k which is very huge vector, in real world problems it is common to have data-set having millions of words upon which training such MLP becomes horrible. So this is where the concept of RNN developed. Below is simple architecture for an RNN Model. RNN Ex

Above image is regular folded representation for RNN model, you may find different representations for RNN on other blogs or articles and on other hand you may have your own way representing RNN architecture, that is completely okay, finally you need to understand principle behind RNN architecture.

Lets now unfold this RNN architecture and try to understand how RNN works. We shall first look at the different components of RNN cell and than see how does this RNN cell solves the problem of sequence data which is in different lengths.

Each RNN cell has three trainable weights. WIn,Wh,Wout and a hidden state ht, this weights W are updated on every backprop and ht is the information that is carried from previous time step. For clear understanding let us understand with an simple example.

Ex: Above image is regular folded representation

RNN Ex

Applications:

  1. Sentiment analysis: Given a sequence of words model predicting weather it is a positive or negative sentence.
  2. Time Series Analysis: Given the data at each time stamp model predicting what will be the value at next time stamp.
  3. Language Translator: Given sequence information in language A model predicting sequence of information in language B.
  4. Image Captioning: Given an image model predicting sequence of words that forms as caption for that image.
  5. Speech Recognition: Given an sequence format of an audio model predicting sequence of words so that audio is transformed into text data.

References:

  1. https://towardsdatascience.com/the-most-intuitive-and-easiest-guide-for-recurrent-neural-network-873c29da73c7
  2. https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks
  3. math
  4. http://www.deeplearningbook.org/contents/rnn.html
  5. https://www.coursera.org/lecture/nlp-sequence-models/recurrent-neural-network-model-ftkzt