Difficulties in training ANNs with multiple outputs: always constant outputs

Question

Clemens H. el 27 de Dic. de 2023

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/2064426-difficulties-in-training-anns-with-multiple-outputs-always-constant-outputs

Comentada: Clemens H. el 9 de En. de 2024

Does anyone have experience with defining a neural network that has multiple outputs? I want to input a vector and output a vector as well as a matrix. Accordingly, I need a DAG network.

I realize that I need a custom training loop for this, compare with: https://de.mathworks.com/help/deeplearning/ug/train-network-with-multiple-outputs.html

The good news is that the training (code) can basically be performed:

I have chosen the following network architecture:

numNeurons = 10;
% Input
layers1 = [
    featureInputLayer(size(XData,2),'Name','Param_Input',"Normalization","rescale-symmetric");
    fullyConnectedLayer(numNeurons)
    batchNormalizationLayer
    tanhLayer
    fullyConnectedLayer(numNeurons)
    batchNormalizationLayer
    tanhLayer
    fullyConnectedLayer(numNeurons)
    batchNormalizationLayer
    tanhLayer('Name','tanh_middle')           
    ];
lgraph = layerGraph(layers1);
% Output 1
filterSize = dimOutput{1};
numFilters = 20;
strideSize = [1,1];
projectionSize = [1,1,size(XData,2)];
layers2 = [
    fullyConnectedLayer(numNeurons,'Name','fcEF')
    batchNormalizationLayer
    tanhLayer
    fullyConnectedLayer(numNeurons)
    batchNormalizationLayer
    tanhLayer
    projectAndReshapeLayerNew(projectionSize)
    transposedConv2dLayer(filterSize,numFilters,'Stride',strideSize,Cropping="same")
    batchNormalizationLayer
    tanhLayer
    transposedConv2dLayer(filterSize,1,'Stride',strideSize,'Name','Output1')
    ];
% Output 2
layers3 = [
    fullyConnectedLayer(numNeurons,'Name','fcFreq')
    batchNormalizationLayer
    tanhLayer
    fullyConnectedLayer(numNeurons)
    batchNormalizationLayer
    tanhLayer
    fullyConnectedLayer(dimOutput{2},'Name','Output2')
    ];
lgraph = addLayers(lgraph,layers2);    
lgraph = addLayers(lgraph,layers3);  
lgraph = connectLayers(lgraph,"tanh_middle","fcEF");   
lgraph = connectLayers(lgraph,"tanh_middle","fcFreq");
end
% [...] Training
% "Assemble Multiple-Output Network for Prediction"
lgraphNew = layerGraph(trainedNet);
layerReg1 = regressionLayer(Name="regOutput1");
layerReg2 = regressionLayer(Name="regOutput2");
lgraphNew = addLayers(lgraphNew,layerReg1);
lgraphNew = addLayers(lgraphNew,layerReg2);
lgraphNew = connectLayers(lgraphNew,"Output1","regOutput1");
lgraphNew = connectLayers(lgraphNew,"Output2","regOutput2"); 
figure
plot(lgraphNew)

However, the problem is that all outputs (coefficients of the vector and the matrix) are the same. Apparently the network learns some average values and not the concrete training data as desired:

Output 1 (all matrices are the same):

Output 2 (all "vectors"/lines are the same):

Is the network architecture very unfavorable? What could be the reason? I would rule out the training data as a reason, as the training is successful if I train separate single-output ANNs.

Thank you and best regards.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Venu el 8 de En. de 2024

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/2064426-difficulties-in-training-anns-with-multiple-outputs-always-constant-outputs#answer_1385066

Hi @Clemens H.,

In your case, I suspect factors and attributes of your custom layer 'ProjectAndReshapeLayer'. You can check with layer initialization aspect, consider applying regularization, check how the projection matrix is learned, verify that the reshaping operation is appropriate for the specific output type. It's important to verify that this layer is not inadvertently causing the network to learn average values rather than distinct representations for each output.

Try adding another FC layer at the end of layers2 to increase the complexity. This additional complexity can potentially help the network capture more nuanced representations for the first output, especially if the previous layers might not have been capturing the necessary complexity.

2 comentarios
Mostrar NingunoOcultar Ninguno

Udit06 el 9 de En. de 2024

I would like to add one more point to the above answer. In a multi-output scenario, the total loss for the network is often a combination of the individual losses for each output. The model's training objective is to minimize this total loss. However, if one loss dominates the total loss, the network may focus on optimizing for that particular output at the expense of the others, leading to poor performance on the less weighted tasks. To handle this, you can assign weights to each loss component to balance their contributions to the total loss.

I hope this helps.

Clemens H. el 9 de En. de 2024

First of all, thank you very much for your answers! Unfortunately, the problem persists and I would like to address a few points:

1) I did not get my described problem with single-output NNs, so in each case the outputs could be determined separately just fine. I therefore thought that the respective structure of the two NN "arms" was basically ok. But something is wrong?

2) @Venu I took the "ProjectAndReshapeLayer" from the Matlab help on the topic "Train Generative Adversarial Network (GAN)", open the corresponding live script: https://de.mathworks.com/help/deeplearning/ug/train-generative-adversarial-network.html

In the case that I only wanted to output the matrix (single-output variant), it worked. In this respect, I don't know what I should fundamentally change now.

3) @Venu unfortunately I can't add a FC at the end of "layers2", because otherwise the existing "autoflatten" of the FC would no longer result in a matrix as output, then an error would occur with regard to the output dimensions. Further FCs in other places have unfortunately not helped so far.

4) @Udit06 Thanks also for this hint. If I consider the limiting cases that I only consider 100% Output1 or alternatively 100% Output2, even this does not change the result: The NN output is still constant for all samples as described in the question. How can that be?

Iniciar sesión para comentar.

Difficulties in training ANNs with multiple outputs: always constant outputs

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

2 comentarios
Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Difficulties in training ANNs with multiple outputs: always constant outputs

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

2 comentarios Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno