How to specify a portion of dataset for cross-validation with fitrgp?

Question

Katy el 14 de Sept. de 2023

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/2020941-how-to-specify-a-portion-of-dataset-for-cross-validation-with-fitrgp

Respondida: Katy el 29 de Sept. de 2023

I am using fitrgp and would like to do cross-validation using a predetermined dataset as the valiadtion data (I have one dataset for training, and another one for validation). I've read the documentation below and similar questions on this forum, but I haven't seen a way that this is possible. Alternatively, is there a way to specify the indices of one dataset to indicate the training portion and the validation portion?

fitrgp documentation

cvpartition documentation

Any help is appreciated, thanks!

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Katy el 29 de Sept. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/2020941-how-to-specify-a-portion-of-dataset-for-cross-validation-with-fitrgp#answer_1322044

It turns out custom cross-validation partitioning is a feature available in R2023b. I was able to specify the test indices similar to this example.

https://www.mathworks.com/help/releases/R2023b/stats/cvpartition.html#mw_cbfe0131-6ee0-499c-bed3-c083dd22d047

Thanks to the Mathworks Technical Support team as well for the help!

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 2

Maneet Kaur Bagga el 26 de Sept. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/2020941-how-to-specify-a-portion-of-dataset-for-cross-validation-with-fitrgp#answer_1318987

Hi Katy,

As per my understanding to perform cross-validation using a predetermined dataset as the validation data with "fitrgp", "cvpartition" function can be used to create a custom partition object. This allows to specify the indices of the training and validation portions.
For instance, "cvpartition" can be used to create a hold-out validation partition object. The "numObservations" parameter is set to the number of observations in the training dataset. The "HoldOut" method is used, and the size of the validation dataset (X_val) is specified.
The training and test methods of the partition object can then be used to obtain the indices for the training and validation portions, respectively. These indices are used to select the corresponding data from the training dataset (X_train and Y_train).
Finally, the "fitrgp" function can be used to train the GP model using the training data, and the "predict" function is used to obtain the predictions on the validation data (X_val_cv). Then calculate performance metrics, such as mean squared error or R-squared, using the predicted values (Y_val_pred) and the actual validation targets (Y_val_cv).

Please refer to the following documentation for better understanding of the functions:

fitrgp

https://www.mathworks.com/help/stats/fitrgp.html#d126e462217

cvpartition

https://www.mathworks.com/help/stats/cvpartition.html

predict

https://www.mathworks.com/help/stats/linearmodel.predict.html

Hope this helps!

Thank You

Maneet Bagga

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Katy el 27 de Sept. de 2023

Abrir en MATLAB Online

Hi Maneet,

Thank you for this really detailed response! Just to follow-up on this point:

The training and test methods of the partition object can then be used to obtain the indices for the training and validation portions, respectively. These indices are used to select the corresponding data from the training dataset (X_train and Y_train).

Using this cvpartition holdout method, based on my understanding, the indices are then selected randomly by the cvpartition object even if using the number of observations in the test set rather than the fraction.

I referred to this example:

openExample('stats/EstimateNewDataClassificationUsingCrossValidationErrorExample')

and experimented with changing this line:

hpartition = cvpartition(n,'Holdout',0.3)

to an integer (5 for example below)

hpartition = cvpartition(n,'Holdout',5)

From this it seems that the indices in 'idxTrain' and 'idxNew' variables are randomly selected.

I'm hoping to find a way to manually indicate which indices to select as the training set, and which indices to select as the validation set. (i.e. idxTrain = tbl(1:50, :) and idxTest = tbl(1:15, :) for example)

Thank you again for your response!

Iniciar sesión para comentar.

How to specify a portion of dataset for cross-validation with fitrgp?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

How to specify a portion of dataset for cross-validation with fitrgp?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (1)

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos