Retraining YAMNet for audio classification returns channel mismatch error in "deep.inte​rnal.train​.Trainer/t​rain"

14 visualizaciones (últimos 30 días)
I am retraining YAMNet for a binary classification task, operating on spectrograms of audio signals. My training audio has two classes, positive and negative. Audio is preprocessed & features extracted using yamnetPreprocess(). When training the network, trainnet() produces the following error:
Error using deep.internal.train.Trainer/train (line 74)
Number of channels in predictions (2) must match the number of
channels in the targets (3).
Error in deep.internal.train.ParallelTrainer>iTrainWithSplitCommunicator (line 227)
remoteNetwork = train(remoteTrainer, remoteNetwork, workerMbq);
Error in deep.internal.train.ParallelTrainer/computeTraining (line 127)
spmd
Error in deep.internal.train.Trainer/train (line 59)
net = computeTraining(trainer, net, mbq);
Error in deep.internal.train.trainnet (line 54)
net = train(trainer, net, mbq);
Error in trainnet (line 42)
[net,info] = deep.internal.train.trainnet(mbq, net, loss, options, ...
Error in train_DenseNet_detector_from_semi_synthetic_dataset (line 192)
[trained_network, train_info] = trainnet(trainFeatures, trainLabels', net, "crossentropy", options);
My understanding of this error is that it indicates a mismatch between the number of classes the network expects, and the number of classes in the dataset. I do not see how this can be possible, considering the number of classes in the network is explicitly set by the number of classes in the datastore:
classNames = unique(ads.Labels);
numClasses = numel(classNames);
net = audioPretrainedNetwork("yamnet", NumClasses=numClasses);
My script is based on this MATLAB tutorial: audioPretrainedNetwork and there are no functional differences in the way I'm building datastores or preprocessing the data. The training options and the call to trainnet() are configured as follows:
options = trainingOptions('adam', ...
InitialLearnRate = initial_learn_rate, ...
MaxEpochs = max_epochs, ...
MiniBatchSize = mini_batch_size, ...
Shuffle = "every-epoch", ...
Plots = "training-progress", ...
Metrics = "accuracy", ...
Verbose = 1, ...
ValidationData = {single(validationFeatures), validationLabels'}, ...
ValidationFrequency = validationFrequency,...
ExecutionEnvironment="parallel-auto");
[trained_network, train_info] = trainnet(trainFeatures, trainLabels', net, "crossentropy", options);
Relevant variable dimensions are as follows:
>> unique(ads.Labels)
ans =
2×1 categorical array
negative
positiveNoisy
>> size(trainLabels)
ans =
1 16240
>> size(trainFeatures)
ans =
96 64 1 16240
>> size(validationLabels)
ans =
1 6960
>> size(validationFeatures)
ans =
96 64 1 6960
The only real differences between my script and the MATLAB tutorial are that I'm using parallel execution in the training solver, and the datastore outputEnvironment is set to "gpu" . If I set ExecutionEnvironment = "auto" instead of "parallel-auto" and set ads.OutputEnvironment = 'cpu' the error stack is shorter, but the problem is the same:
Error using trainnet (line 46)
Number of channels in predictions (2) must match the number of channels in
the targets (3).
Error in train_DenseNet_detector_from_semi_synthetic_dataset (line 189)
[trained_network, train_info] = trainnet(trainFeatures, trainLabels', net, "crossentropy", options);
Please could someone give me some advice? The root cause of this is buried in the deep learning toolbox, and it's a little beyond me right now.
Thanks,
Ben

Respuesta aceptada

Joss Knight
Joss Knight el 3 de Oct. de 2024
I think the issue will be that your label data is a categorical type with three categories. Run
categories(trainLabels)
to confirm. You might need to delete the unused category using removecats.

Más respuestas (3)

Joss Knight
Joss Knight el 3 de Oct. de 2024
It looks like your network is returning output with three channels instead of two. Could you try running analyzeNetwork(net) to see what it is outputting?

Ben
Ben el 3 de Oct. de 2024
Hi Joss, thanks for your speedy reply.
The last layer is a softmax layer with activations 2(C) x 1(B) and zero learnables. The complete dlnetwork analysis is attached.
It all looks correct, no?
  4 comentarios
Ben
Ben el 3 de Oct. de 2024
>> Zpredict = predict(net, trainFeatures(:,:,:,1:miniBatchSize));
Error using dlnetwork/predict (line 658)
Uninitialized dlnetwork object. Use the initialize function to initialize
network before calling predict.
I have been trying to modify the YAMNet manually, rather than replying on audioPretrainedNetwork(), and I think I've found a bug in subset()...
My audioDatastore is created from a folder which has three sub-directories called "negative", "positiveClean" and "positiveNoisy", and uses those folder names as labels. The files in each folder also contain these labels in their file names (except they are all lower case, with an underscore in the middle).
This script has two modes, one of which does not use the data in the folder "positiveClean". Currently I am removing that data with the subset() function as per the example here subset:
% Build audioDataStore object containing training datasets
ads = audioDatastore(trainingDataPath, "IncludeSubfolders", true,...
"FileExtensions",".wav", "LabelSource", "foldernames", ...
OutputDataType="single");
% Choose which audio files to include based on the script mode:
if mode == 1
% Get logical index for all filenames that do NOT contain 'positive_clean'
NotPositiveClean = cellfun(@(c) ~contains(c, 'positive_clean'), ads.Files);
% Create a subset of the datastore containg all files that are not 'positiveClean'
ads = subset(ads, NotPositiveClean);
disp("Training for detection tasks only. Excluding 'positiveClean' folder.")
elseif mode == 2
disp("Training for both detection and denoising tasks...")
end
% Split the full dataset into training data and validation data.
split = [trainPercentage/100, (100-trainPercentage)/100];
[ads_train, ads_validation] = splitEachLabel(ads, split(1), split(2));
I am then preprocessing, then modifying the network as follows:
% Load the pretrained network
[net, ~] = audioPretrainedNetwork("yamnet");
% Convert the network to a layer graph
lgraph = layerGraph(net);
% Remove the last layers
layersToRemove = {'dense', 'softmax'};
lgraph = removeLayers(lgraph, layersToRemove);
% Create a classification layer
newClassLayer = classificationLayer('Name', 'new_classoutput', 'Classes', classNames)
The "newClassLayer" contains 3 labels, despite being built directly from "classNames", which appears to contain only two classes:
>> newClassLayer
newClassLayer =
ClassificationOutputLayer with properties:
Name: 'new_classoutput'
Classes: [negative positiveClean positiveNoisy]
ClassWeights: 'none'
OutputSize: 3
Hyperparameters
LossFunction: 'crossentropyex'
>> classNames
classNames =
2×1 categorical array
negative
positiveNoisy
Remembering here that classNames is built from ads_train.
classNames = unique(ads_train.Labels);
numClasses = numel(classNames);
Something is very fishy here...
Joss Knight
Joss Knight el 3 de Oct. de 2024
Editada: Joss Knight el 3 de Oct. de 2024
Yes I see. It's not enough just to remove instances of one of the classes from the data, because that class is still one of the label categories. You are going to need to remove that category from your target using removecats, see my other Answer.
To simplify:
% Create label data with 3 classes
randomLabels = categorical(randi(3, 1, 100));
mycats = categories(randomLabels) % 3 categories, '1', '2' and '3'
mycats = 3x1 cell array
{'1'} {'2'} {'3'}
% Remove all the '2's
randomLabels(randomLabels==mycats(2)) = [];
mycats = categories(randomLabels) % Still 3 categories!
mycats = 3x1 cell array
{'1'} {'2'} {'3'}
% Remove the '2' category from the data
randomLabels = removecats(randomLabels, mycats(2));
mycats = categories(randomLabels) % Now there's only '1' and '3'
mycats = 2x1 cell array
{'1'} {'3'}

Iniciar sesión para comentar.


Ben
Ben el 4 de Oct. de 2024
Editada: Ben el 4 de Oct. de 2024
Ok great, thank you Joss.
I have resolved my issue by using removecats() inside the conditional statement that removes my unused data from the dataset.
% Get logical index for all the files we want to keep
keepIdx = cellfun(@(c)...
~contains(c, 'filename_substring_indicating_unwanted_file'), ads.Files);
% Create a subset of the datastore containg all files shown as "true" in keepIdx
ads_subset = subset(ads, keepIdx);
ads_subset.Labels = removecats(ads_subset.Labels, ...
"label_associated_with_removed_files");
This really does seem like a bug, or at the very leasy, an undocumented quirk of subset(). The example in the subset function's official documentation does not show this additional step being necessary, and removecats() is not referenced anywhere in that page.
The order of relevant operations in my code is as follows:
  • build "ads"
  • "ads" contains 600 files, 600 labels, 3 unique
  • categories(ads.Labels) = 3x1 cell array {'negative'}{'positiveClean'}{'positiveNoisy'}
  • Get indices "keepIdx" of files in "ads" with label {'negative'} or {'positiveNoisy'}
  • Create new datastore "ads_subset" using subset() and "keepIdx"
  • "ads_subset" contains 400 files, 400 labels, 2 unique
  • categories(ads_subset.Labels) = 3x1 cell array {'negative'}{'positiveClean'}{'positiveNoisy'}
and I think most would agree this is illogical and unexected behaviour. Is it possible to log a bug fix for this?
Additionally, to improve clarity on how to troubleshoot this kind of issue, could I suggest that the tutorial for YAMNet Transfer Learning on this page might be best to set network class size as:
numClasses = numel(categories(adsTrain.Labels));
net = audioPretrainedNetwork("yamnet",NumClasses=numClasses);
where currently, it shows:
classNames = unique(adsTrain.Labels);
numClasses = numel(classNames);
net = audioPretrainedNetwork("yamnet",NumClasses=numClasses);
Many thanks again for your help :)
  2 comentarios
Joss Knight
Joss Knight el 4 de Oct. de 2024
I'll pass on your comments.
I don't think this is quite as clearcut as you make out. You have asked your underlying datastore to use the folder names as the label source; this information is gathered on construction of the original audioDatastore. subset() shouldn't be making any assumptions about your choice of labels subsequently. You may have removed all the data from one class because you want to fine-tune your model to favour other classes, or for many other reasons. Or put it another way, if you had a model that accepted data from a datastore, it should also support data from a subset of that datastore; but if you pruned any missing classes, it wouldn't.
Nevertheless you raise some interesting points, in particular your point about using numel(categories(...)) instead of unique is a very good one.
Thanks.

Iniciar sesión para comentar.

Categorías

Más información sobre Image Data Workflows en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by