BFGS quasi-Newton backpropagation
net.trainFcn = 'trainbfg' sets the network
trainbfg is a network training function that updates weight and bias
values according to the BFGS quasi-Newton method.
Training occurs according to
trainbfg training parameters, shown here
with their default values:
net.trainParam.epochs— Maximum number of epochs to train. The default value is 1000.
net.trainParam.showWindow— Show training GUI. The default value is
net.trainParam.show— Epochs between displays (
NaNfor no displays). The default value is 25.
net.trainParam.showCommandLine— Generate command-line output. The default value is
net.trainParam.goal— Performance goal. The default value is 0.
net.trainParam.time— Maximum time to train in seconds. The default value is
net.trainParam.min_grad— Minimum performance gradient. The default value is
net.trainParam.max_fail— Maximum validation failures. The default value is
net.trainParam.searchFcn— Name of line search routine to use. The default value is
Parameters related to line search methods (not all used for all methods):
net.trainParam.scal_tol— Divide into delta to determine tolerance for linear search. The default value is 20.
net.trainParam.alpha— Scale factor that determines sufficient reduction in perf. The default value is
net.trainParam.beta— Scale factor that determines sufficiently large step size. The default value is
net.trainParam.delta— Initial step size in interval location step. The default value is
net.trainParam.gamma— Parameter to avoid small reductions in performance, usually set to 0.1 (see
srch_cha). The default value is
net.trainParam.low_lim— Lower limit on change in step size. The default value is
net.trainParam.up_lim— Upper limit on change in step size. The default value is
net.trainParam.maxstep— Maximum step length. The default value is
net.trainParam.minstep— Minimum step length. The default value is
net.trainParam.bmax— Maximum step size. The default value is
net.trainParam.batch_frag— In case of multiple batches, they are considered independent. Any nonzero value implies a fragmented batch, so the final layer’s conditions of a previous trained epoch are used as initial conditions for the next epoch. The default value is
Train Neural Network Using
trainbfg Train Function
This example shows how to train a neural network using the
trainbfg train function.
Here a neural network is trained to predict body fat percentages.
[x, t] = bodyfat_dataset; net = feedforwardnet(10, 'trainbfg'); net = train(net, x, t);
y = net(x);
trainedNet — Trained network
Trained network, returned as a
tr — Training record
Training record (
perf), returned as
a structure whose fields depend on the network training function
net.NET.trainFcn). It can include fields such as:
Training, data division, and performance functions and parameters
Data division indices for training, validation and test sets
Data division masks for training validation and test sets
Number of epochs (
num_epochs) and the best epoch (
A list of training state names (
Fields for each state name recording its value throughout training
Performances of the best network (
You can create a standard network that uses
cascadeforwardnet. To prepare a
custom network to be trained with
'trainbfg'. This sets
trainbfg’s default parameters.
NET.trainParamproperties to desired values.
In either case, calling
train with the resulting network trains the
BFGS Quasi-Newton Backpropagation
Newton’s method is an alternative to the conjugate gradient methods for fast optimization. The basic step of Newton’s method is
where is the Hessian matrix (second derivatives) of the performance index at the
current values of the weights and biases. Newton’s method often converges faster than
conjugate gradient methods. Unfortunately, it is complex and expensive to compute the
Hessian matrix for feedforward neural networks. There is a class of algorithms that is based
on Newton’s method, but which does not require calculation of second derivatives. These are
called quasi-Newton (or secant) methods. They update an approximate Hessian matrix at each
iteration of the algorithm. The update is computed as a function of the gradient. The
quasi-Newton method that has been most successful in published studies is the Broyden,
Fletcher, Goldfarb, and Shanno (BFGS) update. This algorithm is implemented in the
The BFGS algorithm is described in [DeSc83]. This algorithm requires more computation in each
iteration and more storage than the conjugate gradient methods, although it generally
converges in fewer iterations. The approximate Hessian must be stored, and its dimension is
n, where n is equal to the number of weights and
biases in the network. For very large networks it might be better to use Rprop or one of the
conjugate gradient algorithms. For smaller networks, however,
can be an efficient training function.
trainbfg can train any network as long as its weight, net input, and
transfer functions have derivative functions.
Backpropagation is used to calculate derivatives of performance
with respect to the weight and bias variables
X. Each variable is adjusted
according to the following:
X = X + a*dX;
dX is the search direction. The parameter
selected to minimize the performance along the search direction. The line search function
searchFcn is used to locate the minimum point. The first search direction
is the negative of the gradient of performance. In succeeding iterations the search direction
is computed according to the following formula:
dX = -H\gX;
gX is the gradient and
H is a approximate
Hessian matrix. See page 119 of Gill, Murray, and Wright (Practical
Optimization, 1981) for a more detailed discussion of the BFGS quasi-Newton
Training stops when any of these conditions occurs:
The maximum number of
epochs(repetitions) is reached.
The maximum amount of
Performance is minimized to the
The performance gradient falls below
Validation performance (validation error) has increased more than
max_failtimes since the last time it decreased (when using validation).
 Gill, Murray, & Wright, Practical Optimization, 1981