templateGP

Gaussian process template

Since R2023b

Syntax

t = templateGP

t = templateGP(Name=Value)

Description

t = templateGP returns a Gaussian process (GP) template suitable for training regression models. After you create the template t, you can specify it as a learner during training.

example

t = templateGP(Name=Value) specifies additional options using one or more name-value arguments. For example, you can specify the basis function and method for estimating the parameters of the Gaussian process regression (GPR) model.

If you display t in the Command Window, then all options appear empty ([]), except those that you specify using name-value arguments. During training, the training function uses default values for empty options.

Examples

collapse all

Create Default Gaussian Process Template

Open Live Script

Create a default Gaussian process template using the templateGP function.

t = templateGP

t = 
Fit template for regression GP.

                     KernelFunction: []
                   KernelParameters: []
                      BasisFunction: []
                               Beta: []
                              Sigma: []
                          FitMethod: []
                      PredictMethod: []
                          ActiveSet: []
                      ActiveSetSize: []
                    ActiveSetMethod: []
                        Standardize: []
                            Verbose: []
                          CacheSize: []
                            Options: [1x1 struct]
                          Optimizer: []
                   OptimizerOptions: []
           ConstantKernelParameters: []
                      ConstantSigma: []
                    InitialStepSize: []
    InitialSigmaLowerBoundTolerance: []
                            Version: 1
                             Method: 'GP'
                               Type: 'regression'

t is a template object for a Gaussian process learner. All properties of the template object are empty except Version, Method, and Type. When you pass t to the training function, the function fills in the empty properties with their respective default values. For example, the directforecaster function sets the ConstantSigma property to false when you specify t as a learner. For details on other default values, see Name-Value Arguments.

Create Nondefault Gaussian Process Template

Open Live Script

Create a Gaussian process template that specifies a linear basis for the GPR model, and exact methods for fitting and prediction.

t = templateGP(BasisFunction="linear",FitMethod="exact",PredictMethod="exact")

t = 
Fit template for regression GP.

                     KernelFunction: []
                   KernelParameters: []
                      BasisFunction: 'Linear'
                               Beta: []
                              Sigma: []
                          FitMethod: 'Exact'
                      PredictMethod: 'Exact'
                          ActiveSet: []
                      ActiveSetSize: []
                    ActiveSetMethod: []
                        Standardize: []
                            Verbose: []
                          CacheSize: []
                            Options: [1x1 struct]
                          Optimizer: []
                   OptimizerOptions: []
           ConstantKernelParameters: []
                      ConstantSigma: []
                    InitialStepSize: []
    InitialSigmaLowerBoundTolerance: []
                            Version: 1
                             Method: 'GP'
                               Type: 'regression'

t is a template object for a Gaussian process learner. The object display shows the specified properties of the template object. By default, Method and Type are specified as GP and regression, respectively. When you pass t to a training function, the software sets the empty properties to their respective default values.

Input Arguments

collapse all

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: templateGP(BasisFunction="linear",Standardize=true) specifies a linear basis function and to standardize the predictors.

Fitting

collapse all

`FitMethod` — Method to estimate parameters of GPR model
`"none"` | `"exact"` | `"sd"` | `"sr"` | `"fic"`

Method to estimate the parameters of the GPR model, specified as one of the following.

Fit Method	Description
`"none"`	No estimation. Use the initial parameter values as the known parameter values.
`"exact"`	Exact Gaussian process regression. This value is the default if n ≤ 2000, where n is the number of observations.
`"sd"`	Subset of data points approximation. This value is the default if n > 2000, where n is the number of observations. `"sd"` is a sparse method.
`"sr"`	Subset of regressors approximation. `"sr"` is a sparse method.
`"fic"`	Fully independent conditional approximation. `"fic"` is a sparse method.

Example: FitMethod="fic"

`BasisFunction` — Explicit basis in GPR model
`"constant"` (default) | `"none"` | `"linear"` | `"pureQuadratic"` | function handle

Explicit basis in the GPR model, specified as "constant", "none", "linear", "pureQuadratic", or a function handle. If n is the number of observations, the basis function adds the term H*β to the model, where H is the basis matrix and β is a p-by-1 vector of basis coefficients.

Explicit Basis	Basis Matrix
`"none"`	Empty matrix
`"constant"`	$H = 1$ H is an n-by-1 vector of 1s, where n is the number of observations.
`"linear"`	$H = [1, X]$ X is the expanded predictor data after the software creates dummy variables for the categorical variables. For details about creating dummy variables, see `CategoricalPredictors`.
`"pureQuadratic"`	$H = [1, X, X_{2}],$ where $X_{2} = [\begin{matrix} x_{11}^{2} & x_{12}^{2} & \dots & x_{1 d}^{2} \\ x_{21}^{2} & x_{22}^{2} & \dots & x_{2 d}^{2} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ x_{n 1}^{2} & x_{n 2}^{2} & \dots & x_{n d}^{2} \end{matrix}] .$ For this basis option, the software does not support X with categorical predictors.
Function handle	Function handle `hfcn`, which `fitrgp` calls as $H = h f c n (X),$ where X is an n-by-d matrix of predictors, d is the number of predictors after the software creates dummy variables for the categorical variables, and H is an n-by-p matrix of basis functions.

Example: BasisFunction="pureQuadratic"

Data Types: char | string | function_handle

`Beta` — Initial value of coefficients
p-by-1 vector

Initial value of the coefficients for the explicit basis, specified as a p-by-1 vector, where p is the number of columns in the basis matrix H.

The basis matrix depends on the specified basis function. For more information, see BasisFunction.

The training function uses the coefficient initial values as the known coefficient values only when FitMethod is "none".

Data Types: double

`Sigma` — Initial value for noise standard deviation
`std`(`y`)/`sqrt(2)` (default) | positive scalar value

Initial value for the noise standard deviation of the Gaussian process model, specified as a positive scalar value.

The training function parameterizes the noise standard deviation as the sum of SigmaLowerBound and exp(η), where η is an unconstrained value. Therefore, Sigma must be larger than SigmaLowerBound by a small tolerance so that the function can initialize η to a finite value. Otherwise, the function resets Sigma to a compatible value.

The tolerance is 1e-3 when ConstantSigma is false (default) and 1e-6 otherwise. If the tolerance is not small enough relative to the scale of the response variable, you can scale up the response variable so that the tolerance value can be considered small for the response variable.

Example: Sigma=2

Data Types: double

`ConstantSigma` — Constant value of `Sigma` for noise standard deviation
`false` or `0` (default) | `true` or `1`

Constant value of Sigma for the noise standard deviation of the Gaussian process model, specified as a numeric or logical 0 (false) or 1 (true). When ConstantSigma is true, the training function does not optimize the value of Sigma, but instead uses the initial value throughout its computations.

Example: ConstantSigma=true

Data Types: logical

`SigmaLowerBound` — Lower bound on noise standard deviation
`1e-2*std`(`y`) (default) | positive scalar value

Lower bound on the noise standard deviation (Sigma), specified as a positive scalar value.

Sigma must be larger than SigmaLowerBound by a small tolerance.

Example: SigmaLowerBound=0.02

Data Types: double

`Standardize` — Indicator to standardize data
`false` or `0` (default) | `true` or `1`

Indicator to standardize data, specified as a numeric or logical 0 (false) or 1 (true).

If you set Standardize=1, then the software centers and scales each column of the predictor data by the column mean and standard deviation. The software does not standardize the data contained in the dummy variable columns generated for categorical predictors.

Example: Standardize=1

Example: Standardize=true

Data Types: logical

`Regularization` — Regularization standard deviation
`1e-2*std`(`y`) (default) | positive scalar value

Regularization standard deviation for the subset of regressors ("sr") and fully independent conditional ("fic") approximation methods, specified as a positive scalar value. For more information, see FitMethod.

Example: Regularization=0.2

Data Types: double

`ComputationMethod` — Method for computing loglikelihood and gradient
`"qr"` (default) | `"v"`

Method for computing the loglikelihood and gradient for parameter estimation, specified as "qr" or "v". This argument is valid when FitMethod is "sr" or "fic".

"qr" — Use the QR-factorization-based approach, which provides better accuracy.
"v" — Use the V-method-based approach, which provides faster computation.

For more information about these approaches, see Foster, et. al. [7].

Example: ComputationMethod="v"

Kernel (Covariance) Function

collapse all

`KernelFunction` — Form of covariance function
`"squaredexponential"` (default) | `"exponential"` | `"matern32"` | `"matern52"` | `"rationalquadratic"` | `"ardsquaredexponential"` | `"ardexponential"` | `"ardmatern32"` | `"ardmatern52"` | `"ardrationalquadratic"` | function handle

Form of the covariance function, specified as one of the following.

Value	Description
`"exponential"`	Exponential kernel
`"squaredexponential"`	Squared exponential kernel
`"matern32"`	Matern kernel with parameter 3/2
`"matern52"`	Matern kernel with parameter 5/2
`"rationalquadratic"`	Rational quadratic kernel
`"ardexponential"`	Exponential kernel with a separate length scale per predictor
`"ardsquaredexponential"`	Squared exponential kernel with a separate length scale per predictor
`"ardmatern32"`	Matern kernel with parameter 3/2 and a separate length scale per predictor
`"ardmatern52"`	Matern kernel with parameter 5/2 and a separate length scale per predictor
`"ardrationalquadratic"`	Rational quadratic kernel with a separate length scale per predictor
Function handle	Function handle in the form: `Kmn = kfcn(Xm,Xn,theta)`, where `Xm` is an m-by-d matrix, `Xn` is an n-by-d matrix, and `Kmn` is an m-by-n matrix of kernel products such that `Kmn`(i,j) is the kernel product between `Xm`(i,:) and `Xn`(j,:). d is the number of predictor variables after the software creates dummy variables for the categorical variables. For details about creating dummy variables, see `CategoricalPredictors`. `theta` is the r-by-1 unconstrained parameter vector for `kfcn`.

For more information on the kernel functions, see Kernel (Covariance) Function Options.

Example: KernelFunction="matern32"

Data Types: char | string | function_handle

`KernelParameters` — Initial values for kernel parameters
numeric vector

Initial values for the kernel parameters, specified as a numeric vector. The size of the vector and the values depend on the form of the covariance function, specified by the KernelFunction name-value argument.

`KernelFunction Value`	`KernelParameters Value`
`"exponential"`, `"squaredexponential"`, `"matern32"`, or `"matern52"`	2-by-1 vector `phi`, where `phi(1)` contains the length scale and `phi(2)` contains the signal standard deviation. The default initial value of the length scale parameter is the mean of the standard deviations of the predictors. The signal standard deviation is the standard deviation of the responses divided by the square root of 2. That is, `phi = [mean(std(X));std(y)/sqrt(2)]`.
`"rationalquadratic"`	3-by-1 vector `phi`, where `phi(1)` contains the length scale, `phi(2)` contains the scale-mixture parameter, and `phi(3)` contains the signal standard deviation. The default initial value of the length scale parameter is the mean of the standard deviations of the predictors. The signal standard deviation is the standard deviation of the responses divided by the square root of 2. The default initial value for the scale-mixture parameter is 1. That is, `phi = [mean(std(X));1;std(y)/sqrt(2)]`.
`"ardexponential"`, `"ardsquaredexponential"`, `"ardmatern32"`, or `"ardmatern52"`	(d+1)-by-1 vector `phi`, where `phi(i)` contains the length scale for predictor i, and `phi(d+1)` contains the signal standard deviation. d is the number of predictor variables after the software creates dummy variables for the categorical variables. For details about creating dummy variables, see `CategoricalPredictors`. The default initial values of the length scale parameters are the standard deviations of the predictors. The signal standard deviation is the standard deviation of the responses divided by the square root of 2. That is, `phi = [std(X)';std(y)/sqrt(2)]`.
`"ardrationalquadratic"`	(d+2)-by-1 vector `phi`, where `phi(i)` contains the length scale for predictor i, `phi(d+1)` contains the scale-mixture parameter, and `phi(d+2)` contains the signal standard deviation. d is the number of predictor variables after the software creates dummy variables for the categorical variables. For details about creating dummy variables, see `CategoricalPredictors`. The default initial values of the length scale parameters are the standard deviations of the predictors. The signal standard deviation is the standard deviation of the responses divided by the square root of 2. The default initial value of the scale-mixture parameter is 1. That is, `phi = [std(X)';1;std(y)/sqrt(2)]`.
Function handle	r-by-1 vector for the initial value of the unconstrained parameter vector `phi` for the custom kernel function `kfcn`. When `KernelFunction` is a function handle, you must supply initial values for the kernel parameters.

For more information on the kernel functions, see Kernel (Covariance) Function Options.

Example: KernelParameters=phi

Data Types: double | single

`DistanceMethod` — Method for computing inter-point distances
`"fast"` (default) | `"accurate"`

Method for computing inter-point distances to evaluate built-in kernel functions, specified as "fast" or "accurate". When you specify "fast", the training function computes ${(x - y)}^{2}$ as $x^{2} + y^{2} - 2 * x * y$ . When you specify "accurate", the training function computes ${(x - y)}^{2}$ .

Example: DistanceMethod="accurate"

Active Set Selection

collapse all

`ActiveSetSize` — Size of active set
integer m (1 ≤ m ≤ n)

Size of the active set, specified as an integer m, 1 ≤ m ≤ n, where n is the number of observations. This argument is valid when FitMethod is "sd", "sr", or "fic".

The default value is min(1000,n) when FitMethod is "sr" or "fic", and min(2000,n) otherwise.

Example: ActiveSetSize=100

Data Types: double

`ActiveSetMethod` — Active set selection method
`"random"` (default) | `"sgma"` | `"entropy"` | `"likelihood"`

Active set selection method, specified as one of the following values.

Value	Description
`"random"`	Random selection
`"sgma"`	Sparse greedy matrix approximation
`"entropy"`	Differential entropy-based selection
`"likelihood"`	Subset of regressors loglikelihood-based selection

All active set selection methods (except "random") require the storage of an n-by-m matrix, where m is the size of the active set and n is the number of observations.

Example: ActiveSetMethod="entropy"

`RandomSearchSetSize` — Random search set size
59 (default) | integer value

Random search set size per greedy inclusion for active set selection, specified as an integer value.

Example: RandomSearchSetSize=30

Data Types: double

`ToleranceActiveSet` — Relative tolerance for terminating active set selection
1e-06 (default) | positive scalar

Relative tolerance for terminating active set selection, specified as a positive scalar.

Example: ToleranceActiveset=0.0002

Data Types: double

`NumActiveSetRepeats` — Number of repetitions
3 (default) | integer value

Number of repetitions for interleaved active set selection and parameter estimation when ActiveSetMethod is not "random", specified as an integer value.

Example: NumActiveSetRepeats=5

Data Types: double

Prediction

collapse all

`PredictMethod` — Method used to make predictions
`"exact"` | `"bcd"` | `"sd"` | `"sr"` | `"fic"`

Method used to make predictions from a Gaussian process model given the parameters, specified as one of the following values.

Value	Description
`"exact"`	Exact Gaussian process regression method. This value is the default if n ≤ 10,000.
`"bcd"`	Block coordinate descent (BCD). This value is the default if n > 10,000.
`"sd"`	Subset of data points approximation
`"sr"`	Subset of regressors approximation
`"fic"`	Fully independent conditional approximation

Example: PredictMethod="bcd"

`BlockSizeBCD` — Block size for BCD method
minimum of 1000 or n (default) | integer in the range 1 to n

Block size for the block coordinate descent method ("bcd"), specified as an integer in the range 1 to n, where n is the number of observations.

Example: BlockSizeBCD=1500

Data Types: double

`NumGreedyBCD` — Number of greedy selections for BCD method
minimum of 100 and `BlockSizeBCD` (default) | integer value in the range 1 to `BlockSizeBCD`

Number of greedy selections for the block coordinate descent method ("bcd"), specified as an integer in the range 1 to BlockSizeBCD.

Example: NumGreedyBCD=150

Data Types: double

`ToleranceBCD` — Relative tolerance on gradient norm
`1e-3` (default) | positive scalar

Relative tolerance on the gradient norm for terminating the block coordinate descent method ("bcd") iterations, specified as a positive scalar.

Example: ToleranceBCD=0.002

Data Types: double

`StepToleranceBCD` — Absolute tolerance on step size
`1e-3` (default) | positive scalar

Absolute tolerance on the step size for terminating the block coordinate descent method ("bcd") iterations, specified as a positive scalar.

Example: StepToleranceBCD=0.002

Data Types: double

`IterationLimitBCD` — Maximum number of BCD iterations
`1000000` (default) | positive integer

Maximum number of block coordinate descent method ("bcd") iterations, specified as a positive integer.

Example: IterationLimitBCD=10000

Data Types: double

Optimization

collapse all

`Optimizer` — Optimizer to use for parameter estimation
`"quasinewton"` (default) | `"lbfgs"` | `"fminsearch"` | `"fminunc"` | `"fmincon"`

Optimizer to use for parameter estimation, specified as one of the values in this table.

Value	Description
`"quasinewton"`	Dense, symmetric rank-1-based, quasi-Newton approximation to the Hessian
`"lbfgs"`	LBFGS-based quasi-Newton approximation to the Hessian
`"fminsearch"`	Unconstrained nonlinear optimization using the simplex search method of Lagarias et al. [5]
`"fminunc"`	Unconstrained nonlinear optimization (requires an Optimization Toolbox™ license)
`"fmincon"`	Constrained nonlinear optimization (requires an Optimization Toolbox license)

For more information on the optimizers, see Algorithms.

Example: Optimizer="fmincon"

`OptimizerOptions` — Options for optimizer
structure | object

Options for the optimizer set by the Optimizer name-value argument, specified as a structure or object created by optimset, statset("fitrgp"), or optimoptions.

Optimizer	Function for Creating Optimizer Options
`"fminsearch"`	`optimset` (structure)
`"quasinewton"` or `"lbfgs"`	`statset("fitrgp")` (structure)
`"fminunc"` or `"fmincon"`	`optimoptions` (object)

The default options depend on the specified optimizer.

Example: OptimizerOptions=opt

`InitialStepSize` — Initial step size
`[]` (default) | real positive scalar | `"auto"`

Initial step size, specified as a real positive scalar or "auto".

InitialStepSize is the approximate maximum absolute value of the first optimization step when the optimizer is "quasinewton" or "lbfgs". The initial step size can determine the initial Hessian approximation during optimization.

By default, the training function does not use the initial step size to determine the initial Hessian approximation. To use the initial step size, set a value for the InitialStepSize name-value argument, or specify InitialStepSize="auto" to have the software determine a value automatically. For more information on "auto", see Algorithms.

Example: InitialStepSize="auto"

Other

collapse all

`Verbose` — Verbosity level
`0` (default) | `1`

Verbosity level, specified as 0 or 1.

0 — The training function suppresses diagnostic messages related to active set selection and block coordinate descent, but displays the messages related to parameter estimation, depending on the value of Display in OptimizerOptions.
1 — The training function displays the iterative diagnostic messages related to parameter estimation, active set selection, and block coordinate descent.

Example: Verbose=1

`CacheSize` — Cache size in megabytes
`1000` (default) | positive scalar

Cache size in megabytes (MB), specified as a positive scalar. Cache size is the extra memory available in addition to the memory required for fitting and active set selection. The training function uses CacheSize to:

Decide whether inter-point distances are cached when estimating parameters.
Decide how matrix vector products are computed for the block coordinate descent method and for making predictions.

Example: CacheSize=2000

Data Types: double

Output Arguments

collapse all

`t` — Gaussian process learner template
template object

Gaussian process learner template suitable for training regression models, returned as a template object. During training, the training function (such as directforecaster) uses default values for empty options.

More About

collapse all

Active Set Selection and Parameter Estimation

For subset of data, subset of regressors, or fully independent conditional approximation fitting methods (FitMethod equal to "sd", "sr", or "fic"), the software selects the active set and computes the parameter estimates in a series of iterations.

In the first iteration, the software uses the initial parameter values in vector η₀ = [β₀,σ₀,θ₀] to select an active set A₁. The software maximizes the GPR marginal loglikelihood or its approximation using η₀ as the initial values and A₁ to compute the new parameter estimates η₁. Next, the software computes the new loglikelihood L₁ using η₁ and A₁.

In the second iteration, the software selects the active set A₂ using the parameter values in η₁. Then, using η₁ as the initial values and A₂, the software maximizes the GPR marginal loglikelihood or its approximation and estimates the new parameter values η₂. Then, using η₂ and A₂, the software computes the new loglikelihood value L₂.

The following table summarizes the iterations and the computations at each iteration.

Iteration Number	Active Set	Parameter Vector	Loglikelihood
1	A₁	η₁	L₁
2	A₂	η₂	L₂
3	A₃	η₃	L₃
…	…	…	…

The software iterates similarly for a specified number of repetitions. You can specify the number of replications for active set selection using the NumActiveSetRepeats name-value argument.

Algorithms

Fitting a GPR model involves estimating the following model parameters from the data:
- Covariance function $k (x_{i}, x_{j} | θ)$ parameterized in terms of kernel parameters in vector $θ$ (see Kernel (Covariance) Function Options)
- Noise variance $σ^{2}$
- Coefficient vector of fixed-basis functions $β$
The value of the KernelParameters name-value argument is a vector that consists of initial values for the signal standard deviation $σ_{f}$ and the characteristic length scales $σ_{l}$ . The software uses these values to determine the kernel parameters. Similarly, the Sigma name-value argument contains the initial value for the noise standard deviation $σ$ .
During optimization, the software creates a vector of unconstrained initial parameter values $η_{0}$ by using the initial values for the noise standard deviation and the kernel parameters.
The software analytically determines the explicit basis coefficients $β$ , specified by the Beta name-value argument, from estimated values of $θ$ and $σ^{2}$ . Therefore, $β$ does not appear in the $η_{0}$ vector when the software initializes numerical optimization.

Note
If you do not specify the estimation of parameters for the GPR model, the software uses the value of the Beta name-value argument and other initial parameter values as the known GPR parameter values (see Beta). In all other cases, the value of Beta is optimized analytically from the objective function.
The quasi-Newton optimizer uses a trust-region method with a dense, symmetric rank-1-based (SR1), quasi-Newton approximation to the Hessian. The LBFGS optimizer uses a standard line-search method with a limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) quasi-Newton approximation to the Hessian. See Nocedal and Wright [6].
If you set the InitialStepSize name-value argument to "auto" the software determines the initial step size ${‖ s_{0} ‖}_{\infty}$ by using ${‖ s_{0} ‖}_{\infty} = 0.5 {‖ η_{0} ‖}_{\infty} + 0.1$ .
$s_{0}$ is the initial step vector, and $η_{0}$ is the vector of unconstrained initial parameter values.
During optimization, the software uses the initial step size ${‖ s_{0} ‖}_{\infty}$ as follows:
If you specify Optimizer="quasinewton" with the initial step size, then the initial Hessian approximation is $\frac{{‖ g_{0} ‖}_{\infty}}{{‖ s_{0} ‖}_{\infty}} I$ .
If you specify Optimizer="lbfgs" with the initial step size, then the initial inverse-Hessian approximation is $\frac{{‖ s_{0} ‖}_{\infty}}{{‖ g_{0} ‖}_{\infty}} I$ .
$g_{0}$ is the initial gradient vector, and $I$ is the identity matrix.

References

[1] Nash, W.J., T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford. "The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait." Sea Fisheries Division, Technical Report No. 48, 1994.

[2] Waugh, S. "Extending and Benchmarking Cascade-Correlation: Extensions to the Cascade-Correlation Architecture and Benchmarking of Feed-forward Supervised Artificial Neural Networks." University of Tasmania Department of Computer Science thesis, 1995.

[3] Lichman, M. UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science, 2013. http://archive.ics.uci.edu/ml.

[4] Rasmussen, C. E. and C. K. I. Williams. "Gaussian Processes for Machine Learning". MIT Press. Cambridge, Massachusetts, 2006.

[5] Lagarias, J. C., J. A. Reeds, M. H. Wright, and P. E. Wright. "Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions." SIAM Journal of Optimization. vol. 9, no. 1, January 1998, pp. 112–147.

[6] Nocedal, J. and S. J. Wright. Numerical Optimization, Second Edition. Springer Series in Operations Research, Springer Verlag, 2006.

[7] Foster, L., et. al. "Stable and Efficient Gaussian Process Calculations", Journal of Machine Learning Research. vol. 10, no. 31, April 2009, pp. 857–882.

Version History

Introduced in R2023b

templateGP

Syntax

Description

Examples

Create Default Gaussian Process Template

Create Nondefault Gaussian Process Template

Input Arguments

Name-Value Arguments

FitMethod — Method to estimate parameters of GPR model "none" | "exact" | "sd" | "sr" | "fic"

BasisFunction — Explicit basis in GPR model "constant" (default) | "none" | "linear" | "pureQuadratic" | function handle

Beta — Initial value of coefficients p-by-1 vector

Sigma — Initial value for noise standard deviation std(y)/sqrt(2) (default) | positive scalar value

ConstantSigma — Constant value of Sigma for noise standard deviation false or 0 (default) | true or 1

SigmaLowerBound — Lower bound on noise standard deviation 1e-2*std(y) (default) | positive scalar value

Standardize — Indicator to standardize data false or 0 (default) | true or 1

Regularization — Regularization standard deviation 1e-2*std(y) (default) | positive scalar value

ComputationMethod — Method for computing loglikelihood and gradient "qr" (default) | "v"

KernelFunction — Form of covariance function "squaredexponential" (default) | "exponential" | "matern32" | "matern52" | "rationalquadratic" | "ardsquaredexponential" | "ardexponential" | "ardmatern32" | "ardmatern52" | "ardrationalquadratic" | function handle

KernelParameters — Initial values for kernel parameters numeric vector

DistanceMethod — Method for computing inter-point distances "fast" (default) | "accurate"

ActiveSetSize — Size of active set integer m (1 ≤ m ≤ n)

ActiveSetMethod — Active set selection method "random" (default) | "sgma" | "entropy" | "likelihood"

RandomSearchSetSize — Random search set size 59 (default) | integer value

ToleranceActiveSet — Relative tolerance for terminating active set selection 1e-06 (default) | positive scalar

NumActiveSetRepeats — Number of repetitions 3 (default) | integer value

PredictMethod — Method used to make predictions "exact" | "bcd" | "sd" | "sr" | "fic"

BlockSizeBCD — Block size for BCD method minimum of 1000 or n (default) | integer in the range 1 to n

NumGreedyBCD — Number of greedy selections for BCD method minimum of 100 and BlockSizeBCD (default) | integer value in the range 1 to BlockSizeBCD

ToleranceBCD — Relative tolerance on gradient norm 1e-3 (default) | positive scalar

StepToleranceBCD — Absolute tolerance on step size 1e-3 (default) | positive scalar

IterationLimitBCD — Maximum number of BCD iterations 1000000 (default) | positive integer

Optimizer — Optimizer to use for parameter estimation "quasinewton" (default) | "lbfgs" | "fminsearch" | "fminunc" | "fmincon"

OptimizerOptions — Options for optimizer structure | object

InitialStepSize — Initial step size [] (default) | real positive scalar | "auto"

Verbose — Verbosity level 0 (default) | 1

CacheSize — Cache size in megabytes 1000 (default) | positive scalar

Output Arguments

t — Gaussian process learner template template object

More About

Active Set Selection and Parameter Estimation

Algorithms

References

Version History

See Also

Topics

`FitMethod` — Method to estimate parameters of GPR model
`"none"` | `"exact"` | `"sd"` | `"sr"` | `"fic"`

`BasisFunction` — Explicit basis in GPR model
`"constant"` (default) | `"none"` | `"linear"` | `"pureQuadratic"` | function handle

`Beta` — Initial value of coefficients
p-by-1 vector

`Sigma` — Initial value for noise standard deviation
`std`(`y`)/`sqrt(2)` (default) | positive scalar value

`ConstantSigma` — Constant value of `Sigma` for noise standard deviation
`false` or `0` (default) | `true` or `1`

`SigmaLowerBound` — Lower bound on noise standard deviation
`1e-2*std`(`y`) (default) | positive scalar value

`Standardize` — Indicator to standardize data
`false` or `0` (default) | `true` or `1`

`Regularization` — Regularization standard deviation
`1e-2*std`(`y`) (default) | positive scalar value

`ComputationMethod` — Method for computing loglikelihood and gradient
`"qr"` (default) | `"v"`

`KernelFunction` — Form of covariance function
`"squaredexponential"` (default) | `"exponential"` | `"matern32"` | `"matern52"` | `"rationalquadratic"` | `"ardsquaredexponential"` | `"ardexponential"` | `"ardmatern32"` | `"ardmatern52"` | `"ardrationalquadratic"` | function handle

`KernelParameters` — Initial values for kernel parameters
numeric vector

`DistanceMethod` — Method for computing inter-point distances
`"fast"` (default) | `"accurate"`

`ActiveSetSize` — Size of active set
integer m (1 ≤ m ≤ n)

`ActiveSetMethod` — Active set selection method
`"random"` (default) | `"sgma"` | `"entropy"` | `"likelihood"`

`RandomSearchSetSize` — Random search set size
59 (default) | integer value

`ToleranceActiveSet` — Relative tolerance for terminating active set selection
1e-06 (default) | positive scalar

`NumActiveSetRepeats` — Number of repetitions
3 (default) | integer value

`PredictMethod` — Method used to make predictions
`"exact"` | `"bcd"` | `"sd"` | `"sr"` | `"fic"`

`BlockSizeBCD` — Block size for BCD method
minimum of 1000 or n (default) | integer in the range 1 to n

`NumGreedyBCD` — Number of greedy selections for BCD method
minimum of 100 and `BlockSizeBCD` (default) | integer value in the range 1 to `BlockSizeBCD`

`ToleranceBCD` — Relative tolerance on gradient norm
`1e-3` (default) | positive scalar

`StepToleranceBCD` — Absolute tolerance on step size
`1e-3` (default) | positive scalar

`IterationLimitBCD` — Maximum number of BCD iterations
`1000000` (default) | positive integer

`Optimizer` — Optimizer to use for parameter estimation
`"quasinewton"` (default) | `"lbfgs"` | `"fminsearch"` | `"fminunc"` | `"fmincon"`

`OptimizerOptions` — Options for optimizer
structure | object

`InitialStepSize` — Initial step size
`[]` (default) | real positive scalar | `"auto"`

`Verbose` — Verbosity level
`0` (default) | `1`

`CacheSize` — Cache size in megabytes
`1000` (default) | positive scalar

`t` — Gaussian process learner template
template object