# gru

Gated recurrent unit

Since R2020a

## Syntax

``Y = gru(X,H0,weights,recurrentWeights,bias)``
``[Y,hiddenState] = gru(X,H0,weights,recurrentWeights,bias)``
``___ = gru(X,H0,weights,recurrentWeights,bias,DataFormat=FMT)``
``___ = gru(X,H0,weights,recurrentWeights,bias,Name=Value)``

## Description

The gated recurrent unit (GRU) operation allows a network to learn dependencies between time steps in time series and sequence data.

Note

This function applies the deep learning GRU operation to `dlarray` data. If you want to apply a GRU operation within a `layerGraph` object or `Layer` array, use the following layer:

example

````Y = gru(X,H0,weights,recurrentWeights,bias)` applies a gated recurrent unit (GRU) calculation to input `X` using the initial hidden state `H0`, and parameters `weights`, `recurrentWeights`, and `bias`. The input `X` must be a formatted `dlarray`. The output `Y` is a formatted `dlarray` with the same dimension format as `X`, except for any `"S"` dimensions.The `gru` function updates the hidden state using the hyperbolic tangent function (tanh) as the state activation function. The `gru` function uses the sigmoid function given by $\sigma \left(x\right)={\left(1+{e}^{-x}\right)}^{-1}$ as the gate activation function.```
````[Y,hiddenState] = gru(X,H0,weights,recurrentWeights,bias)` also returns the hidden state after the GRU operation.```
````___ = gru(X,H0,weights,recurrentWeights,bias,DataFormat=FMT)` also specifies the dimension format `FMT` when `X` is not a formatted `dlarray`. The output `Y` is an unformatted `dlarray` with the same dimension order as `X`, except for any `"S"` dimensions.```
````___ = gru(X,H0,weights,recurrentWeights,bias,Name=Value)` specifies additional options using one or more name-value arguments.```

## Examples

collapse all

Perform a GRU operation using 100 hidden units.

Create the input sequence data as 32 observations with ten channels and a sequence length of 64.

```numFeatures = 10; numObservations = 32; sequenceLength = 64; X = randn(numFeatures,numObservations,sequenceLength); X = dlarray(X,"CBT");```

Create the initial hidden state with 100 hidden units. Use the same initial hidden state for all observations.

```numHiddenUnits = 100; H0 = zeros(numHiddenUnits,1);```

Create the learnable parameters for the GRU operation.

```weights = dlarray(randn(3*numHiddenUnits,numFeatures)); recurrentWeights = dlarray(randn(3*numHiddenUnits,numHiddenUnits)); bias = dlarray(randn(3*numHiddenUnits,1));```

Perform the GRU calculation.

`[Y,hiddenState] = gru(X,H0,weights,recurrentWeights,bias);`

View the size and dimension format of the output.

`size(Y)`
```ans = 1×3 100 32 64 ```
`Y.dims`
```ans = 'CBT' ```

View the size of the hidden state.

`size(hiddenState)`
```ans = 1×2 100 32 ```

You can use the hidden state to keep track of the state of the GRU operation and input further sequential data.

## Input Arguments

collapse all

Input data, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array. When `X` is not a formatted `dlarray`, you must specify the dimension label format using the `DataFormat` name-value argument. If `X` is a numeric array, at least one of `H0`, `weights`, `recurrentWeights`, or `bias` must be a `dlarray`.

`X` must contain a sequence dimension labeled `"T"`. If `X` has any spatial dimensions labeled `"S"`, they are flattened into the `"C"` channel dimension. If `X` does not have a channel dimension, then one is added. If `X` has any unspecified dimensions labeled `"U"`, they must be singleton.

Data Types: `single` | `double`

Initial hidden state vector, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array.

If `H0` is a formatted `dlarray`, it must contain a channel dimension labeled `"C"` and optionally a batch dimension labeled `"B"` with the same size as the `"B"` dimension of `X`. If `H0` does not have a `"B"` dimension, the function uses the same hidden state vector for each observation in `X`.

If `H0` is a formatted `dlarray`, then the size of the `"C"` dimension determines the number of hidden units. Otherwise, the size of the first dimension determines the number of hidden units.

Data Types: `single` | `double`

Weights, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array.

Specify `weights` as a matrix of size `3*NumHiddenUnits`-by-`InputSize`, where `NumHiddenUnits` is the size of the `"C"` dimension of `H0`, and `InputSize` is the size of the `"C"` dimension of `X` multiplied by the size of each `"S"` dimension of `X`, where present.

If `weights` is a formatted `dlarray`, it must contain a `"C"` dimension of size `3*NumHiddenUnits` and a `"U"` dimension of size `InputSize`.

Data Types: `single` | `double`

Recurrent weights, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array.

Specify `recurrentWeights` as a matrix of size `3*NumHiddenUnits`-by-`NumHiddenUnits`, where `NumHiddenUnits` is the size of the `"C"` dimension of `H0`.

If `recurrentWeights` is a formatted `dlarray`, it must contain a `"C"` dimension of size `3*NumHiddenUnits` and a `"U"` dimension of size `NumHiddenUnits`.

Data Types: `single` | `double`

Bias, specified as a formatted `dlarray`, an unformatted `dlarray`, or a numeric array.

Specify `bias` as a vector of length `3*NumHiddenUnits`, where `NumHiddenUnits` is the size of the `"C"` dimension of `H0`.

If `bias` is a formatted `dlarray`, the nonsingleton dimension must be labeled with `"C"`.

Data Types: `single` | `double`

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `Y = gru(X,H0,weights,recurrentWeights,bias,DataFormat="CTB")` applies the GRU operation and specifies that the data has format `"CTB"` (channel, time, batch).

Dimension order of unformatted input data, specified as a character vector or string scalar `FMT` that provides a label for each dimension of the data.

When you specify the format of a `dlarray` object, each character provides a label for each dimension of the data and must be one of these options:

• `"S"` — Spatial

• `"C"` — Channel

• `"B"` — Batch (for example, samples and observations)

• `"T"` — Time (for example, time steps of sequences)

• `"U"` — Unspecified

You can specify multiple dimensions labeled `"S"` or `"U"`. You can use the labels `"C"`, `"B"`, and `"T"` at most once.

You must specify `DataFormat` when the input data is not a formatted `dlarray`.

Data Types: `char` | `string`

Since R2023a

Reset gate mode, specified as one of the following:

• `"after-multiplication"` — Apply reset gate after matrix multiplication. This option is cuDNN compatible.

• `"before-multiplication"` — Apply reset gate before matrix multiplication.

• `"recurrent-bias-after-multiplication"` — Apply reset gate after matrix multiplication and use an additional set of bias terms for the recurrent weights.

For more information about the reset gate calculations, see the Gated Recurrent Unit Layer definition on the `gruLayer` reference page.

## Output Arguments

collapse all

GRU output, returned as a `dlarray`. The output `Y` has the same underlying data type as the input `X`.

If the input data `X` is a formatted `dlarray`, `Y` has the same dimension format as `X`, except for any `"S"` dimensions. If the input data is not a formatted `dlarray`, `Y` is an unformatted `dlarray` with the same dimension order as the input data.

The size of the `"C"` dimension of `Y` is the same as the number of hidden units, specified by the size of the `"C"` dimension of `H0`.

Hidden state vector for each observation, returned as a `dlarray` or a numeric array with the same data type as `H0`.

If the input `H0` is a formatted `dlarray`, then the output `hiddenState` is a formatted `dlarray` with the format `"CB"`.

collapse all

### Gated Recurrent Unit

The GRU operation allows a network to learn dependencies between time steps in time series and sequence data. For more information, see the Gated Recurrent Unit Layer definition on the `gruLayer` reference page.

## References

[1] Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).

## Version History

Introduced in R2020a

expand all