Neural Network is it better to use all data to train the model and ignore test sample.
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
Valkmi
el 18 de Mayo de 2017
Respondida: Greg Heath
el 19 de Mayo de 2017
Hi!
Quick question about Neural Net Fitting app. I have two large datasets. One dataset I can use to train the model and the other to test how the model fits after creating the matlab function from the neural net fitting app. Should I just ignore the test sample to get as much training data as possible as I test it myself on new data after creating the neural net function with the app. Both datasets are around 17x700000 for inputs to determine 1x700000 output. What do you think would be the best training method(LM, SCG etc.)? Why is there a test sample, it only limits the amount of data that you provide for the training of the model?
0 comentarios
Respuesta aceptada
Greg Heath
el 19 de Mayo de 2017
Typically, there are 3 data subsets, not 2, containing a total of N points : training, validation and test. The MATLAB default subset sizes are
Ntrn ~ 0.7*N, Nval ~ 0.15*N and Ntst ~ 0.15*N
GOAL: Design a net which yields a sufficiently low error on NONTRAINING data.
TRAINING SUBSET: Ntrn points are used to directly determine weight values to reduce
error. However, training subset error estimates tend to be very biased (because the
same data is used for error reduction and error estimation)
VALIDATION SUBSET: Nval points are used to stop training when the validation subset
error increases for a specified number (MATLAB default is 6) of continuous epochs.
Validation subset error estimates are much less biased than those of the training subset.
TEST SUBSET: Ntst points are used to obtain an UNBIASED error estimate of performance
on unseen data.
With N = 700,000 you probably have ~ 100 times as much data as you need.
So my suggestion is to first plot the target data vs each of the 17 inputs to see how much and what data is needed to adequately characterize the output. If you are extremely lucky maybe the output can be satisfactorily determined by only one or a few inputs.
Hope this helps.
Thank you for formally accepting my answer
Greg
0 comentarios
Más respuestas (0)
Ver también
Categorías
Más información sobre Modeling and Prediction with NARX and Time-Delay Networks en Help Center y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!