Main Content

## Use Automatic Differentiation In Deep Learning Toolbox

### Custom Training and Calculations Using Automatic Differentiation

Automatic differentiation makes it easier to create custom training loops, custom layers, and other deep learning customizations.

Generally, the simplest way to customize deep learning training is to create a `dlnetwork`. Include the layers you want in the network. Then perform training in a custom loop by using some sort of gradient descent, where the gradient is the gradient of the objective function. The objective function can be classification error, cross-entropy, or any other relevant scalar function of the network weights. See List of Functions with dlarray Support.

This example is a high-level version of a custom training loop. Here, `f` is the objective function, such as loss, and `g` is the gradient of the objective function with respect to the weights in the network `net`. The `update` function represents some type of gradient descent.

```% High-level training loop n = 1; while (n < nmax) [f,g] = dlfeval(@model,net,dlX,t); net = update(net,g); n = n + 1; end```

You call `dlfeval` to compute the numeric value of the objective and gradient. To enable the automatic computation of the gradient, the data `dlX` must be a `dlarray`.

`dlX = dlarray(X);`

The objective function has a `dlgradient` call to calculate the gradient. The `dlgradient` call must be inside of the function that `dlfeval` evaluates.

```function [f,g] = model(net,dlX,T) % Calculate objective using supported functions for dlarray y = forward(net,dlX); f = fcnvalue(y,T); % crossentropy or similar g = dlgradient(f,net.Learnables); % Automatic gradient end```

For an example using a `dlnetwork` with a simple `dlfeval`-`dlgradient`-`dlarray` syntax, see Grad-CAM Reveals the Why Behind Deep Learning Decisions. For a more complex example using a custom training loop, see Train Generative Adversarial Network (GAN). For further details on custom training using automatic differentiation, see Define Custom Training Loops, Loss Functions, and Networks.

### Use `dlgradient` and `dlfeval` Together for Automatic Differentiation

To use automatic differentiation, you must call `dlgradient` inside a function and evaluate the function using `dlfeval`. Represent the point where you take a derivative as a `dlarray` object, which manages the data structures and enables tracing of evaluation. For example, the Rosenbrock function is a common test function for optimization.

```function [f,grad] = rosenbrock(x) f = 100*(x(2) - x(1).^2).^2 + (1 - x(1)).^2; grad = dlgradient(f,x); end```

Calculate the value and gradient of the Rosenbrock function at the point `x0` = [–1,2]. To enable automatic differentiation in the Rosenbrock function, pass `x0` as a `dlarray`.

```x0 = dlarray([-1,2]); [fval,gradval] = dlfeval(@rosenbrock,x0)```
```fval = 1x1 dlarray 104 gradval = 1x2 dlarray 396 200```

For an example using automatic differentiation, see Grad-CAM Reveals the Why Behind Deep Learning Decisions.

### Derivative Trace

To evaluate a gradient numerically, a `dlarray` constructs a data structure for reverse mode differentiation, as described in Automatic Differentiation Background. This data structure is the trace of the derivative computation. Keep in mind these guidelines when using automatic differentiation and the derivative trace:

• Do not introduce a new `dlarray` inside of an objective function calculation and attempt to differentiate with respect to that object. For example:

```function [dy,dy1] = fun(x1) x2 = dlarray(0); y = x1 + x2; dy = dlgradient(y,x2); % Error: x2 is untraced dy1 = dlgradient(y,x1); % No error even though y has an untraced portion end```
• Do not use `extractdata` with a traced argument. Doing so breaks the tracing. For example:

```fun = @(x)dlgradient(x + atan(extractdata(x)),x); % Gradient for any point is 1 due to the leading 'x' term in fun. dlfeval(fun,dlarray(2.5))```
```ans = 1x1 dlarray 1```

However, you can use `extractdata` to introduce a new independent variable from a dependent one.

• When working in parallel, moving traced dlarray objects between the client and workers breaks the tracing. The traced dlarray object is saved on the worker and loaded in the client as an untraced dlarray object. To avoid breaking tracing when working in parallel, compute all required gradients on the worker and then combine the gradients on the client. For an example, see Train Network in Parallel with Custom Training Loop.

• Use only supported functions. For a list of supported functions, see List of Functions with dlarray Support. To use an unsupported function f, try to implement f using supported functions.

### Characteristics of Automatic Derivatives

• You can evaluate gradients using automatic differentiation only for scalar-valued functions. Intermediate calculations can have any number of variables, but the final function value must be scalar. If you need to take derivatives of a vector-valued function, take derivatives of one component at a time. In this case, consider setting the `dlgradient` `'RetainData'` name-value pair argument to `true`.

• A call to `dlgradient` evaluates derivatives at a particular point. The software generally makes an arbitrary choice for the value of a derivative when there is no theoretical value. For example, the `relu` function, `relu(x) = max(x,0)`, is not differentiable at `x = 0`. However, `dlgradient` returns a value for the derivative.

```x = dlarray(0); y = dlfeval(@(t)dlgradient(relu(t),t),x)```
```y = 1x1 dlarray 0```

The value at the nearby point `eps` is different.

```x = dlarray(eps); y = dlfeval(@(t)dlgradient(relu(t),t),x)```
```y = 1x1 dlarray 1```

Download ebook