Gated recurrent unit
| Part of a series on |
| Machine learning and data mining |
|---|
In artificial neural networks, the gated recurrent unit (GRU) is a gating mechanism used in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al.[1] The GRU is like a long short-term memory (LSTM) with a gating mechanism to input or forget certain features,[2] but lacks a context vector or output gate, resulting in fewer parameters than LSTM.[3] GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM.[4][5] GRUs showed that gating is indeed helpful in general, and Bengio's team came to no concrete conclusion on which of the two gating units was better.[6][7]
Architecture
[edit | edit source]There are several variations on the full gated unit, with gating done using the previous hidden state and the bias in various combinations, and a simplified form called minimal gated unit.[8]
In the following, the operator denotes the Hadamard product.
Fully gated unit
[edit | edit source]
Initially, for , the output vector is .
Variables ( denotes the number of input features and the number of output features):
- : input vector
- : output vector
- : candidate activation vector
- : update gate vector
- : reset gate vector
- , and : parameter matrices and vector which need to be learned during training
- : The original is a logistic function.
- : The original is a hyperbolic tangent.
Alternative activation functions are possible, provided that .



Alternate forms can be created by changing and [9]
- Type 1: each gate depends only on the previous hidden state and the bias.
- Type 2: each gate depends only on the previous hidden state.
- Type 3: each gate is computed using only the bias.
Minimal gated unit
[edit | edit source]The minimal gated unit (MGU) is similar to the fully gated unit, except the update and reset gate vector is merged into a forget gate. This also implies that the equation for the output vector must be changed:[10]
Variables
- : input vector
- : output vector
- : candidate activation vector
- : forget vector
- , and : parameter matrices and vector
Light gated recurrent unit
[edit | edit source]The light gated recurrent unit (LiGRU)[4] removes the reset gate altogether, replaces tanh with the ReLU activation, and applies batch normalization (BN):
LiGRU has been studied from a Bayesian perspective.[11] This analysis yielded a variant called light Bayesian recurrent unit (LiBRU), which showed slight improvements over the LiGRU on speech recognition tasks.
References
[edit | edit source]- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).