Volatile GPU-Util is 0% during Neural network training

Hello.
I would like to train my neural network with 4 GPUs (on a remote server)
To utilize my GPUs I set ExecutionEnvironment in the training option to 'multi-gpu'
However, the Volatile GPU-util remains at 0% during training.
It seems that the data load on the GPU memory.
I would appreciate your help.

8 comentarios

Looks like you have 14 MATLABs running per device. That's never going to work. What code are you running? Hard to help without knowing what you were actually doing. Are they all your processes or are you using a shared machine?
486MiB is how much memory gets reserved by a process onto the device when it's selected, it's probably not your data.
기태 김
기태 김 el 9 de Sept. de 2023
Editada: Walter Roberson el 9 de Sept. de 2023
I replced the datastore from the example to my data pah and it has 123,174 RGB images.
and I wrote an Autoencoder for Image regrssion.
% Define Input Size
inputSize = [224, 224, 3];
%Data loading
imds = imageDatastore('my path'), ...
'IncludeSubfolders', true, 'LabelSource', 'foldernames');
imds.ReadSize = 500;
% Split the data first
[imdsTrain, imdsVal] = splitEachLabel(imds, 0.7, 0.3);
dsTrain_p = combine(imdsTrain,imdsTrain);
dsVal_p = combine(imdsVal,imdsVal);
dsTrain = transform(dsTrain_p,@commonPreprocessing);
dsVal = transform(dsVal_p,@commonPreprocessing);
exampleData = preview(dsTrain);
inputs = exampleData(:,1);
responses = exampleData(:,2);
minibatch = cat(2,inputs,responses);
montage(minibatch',Size=[8 2])
title("Inputs (Left) and Responses (Right)")
layers = [
imageInputLayer(inputSize, 'Name', 'input')
% Encoder
convolution2dLayer(3, 32, 'Padding', 'same', 'Name', 'conv_1_1')
batchNormalizationLayer('Name', 'bn_1_1')
reluLayer('Name', 'relu_1_1')
convolution2dLayer(3, 32, 'Padding', 'same', 'Name', 'conv_1_2')
batchNormalizationLayer('Name', 'bn_1_2')
reluLayer('Name', 'relu_1_2')
maxPooling2dLayer(2, 'Stride', 2, 'Name', 'maxpool_1')
convolution2dLayer(3, 64, 'Padding', 'same', 'Name', 'conv_2_1')
batchNormalizationLayer('Name', 'bn_2_1')
reluLayer('Name', 'relu_2_1')
convolution2dLayer(3, 64, 'Padding', 'same', 'Name', 'conv_2_2')
batchNormalizationLayer('Name', 'bn_2_2')
reluLayer('Name', 'relu_2_2')
maxPooling2dLayer(2, 'Stride', 2, 'Name', 'maxpool_2')
% Decoder
transposedConv2dLayer(2, 64, 'Stride', 2, 'Cropping', 'same', 'Name', 'trans_conv_1')
reluLayer('Name', 'relu_3_1')
convolution2dLayer(3, 64, 'Padding', 'same', 'Name', 'conv_3_1')
batchNormalizationLayer('Name', 'bn_3_1')
reluLayer('Name', 'relu_3_2')
convolution2dLayer(3, 64, 'Padding', 'same', 'Name', 'conv_3_2')
batchNormalizationLayer('Name', 'bn_3_2')
reluLayer('Name', 'relu_3_3')
transposedConv2dLayer(2, 32, 'Stride', 2, 'Cropping', 'same', 'Name', 'trans_conv_2')
reluLayer('Name', 'relu_4_1')
convolution2dLayer(3, 32, 'Padding', 'same', 'Name', 'conv_4_1')
batchNormalizationLayer('Name', 'bn_4_1')
reluLayer('Name', 'relu_4_2')
convolution2dLayer(3, 3, 'Padding', 'same', 'Name', 'conv_output')
regressionLayer('Name', 'regression_output')
];
% Training options
options = trainingOptions('adam', ...
'InitialLearnRate', 0.0001, ...
'MaxEpochs', 100, ...
'Shuffle', 'every-epoch', ...
'ValidationData', dsVal, ...
'ValidationFrequency', 5, ... % Depending on your preference
'Verbose', true, ...
'Plots', 'training-progress', ...
'ExecutionEnvironment','multi-gpu',...
'MiniBatchSize', 1024);
% Train the network
[net, info] = trainNetwork(dsTrain, layers, options);
function dataOut = commonPreprocessing(data)
dataOut = cell(size(data));
for col = 1:size(data,2)
for idx = 1:size(data,1)
temp = single(data{idx,col});
temp = imresize(temp,[32,32]);
temp = rescale(temp);
dataOut{idx,col} = temp;
end
end
end
Quick clarification question. You mentioned that you are using "a remote server". Are you logging into that remote server manually and running MATLAB or are you running MATLAB Parallel Server to access those remote GPUs?
I am logging into the remote server manually and running MATLAB
Right, but is the server shared with other people? I need to explain why there are at least 25 MATLABs running on the machine. There should just be one for the client MATLAB you launched, and then 3 for the parallel pool, which should have automatically opened with three workers. Have you opened MATLAB lots of times?
기태 김
기태 김 el 10 de Sept. de 2023
Editada: 기태 김 el 11 de Sept. de 2023
Sorry, the commonPreprocessing function I provided in the previous comment differs from the one I used.
I tried to parfor to enhance the preprocessing performance.
function dataOut = commonPreprocessing(data)
dataOut = cell(size(data));
parfor col = 1:size(data,2)
for idx = 1:size(data,1)
temp = single(data{idx,col});
temp = imresize(temp,[32,32]);
temp = rescale(temp);
dataOut{idx,col} = temp;
end
end
end
When I use the above function, I see the more than 25 MATLAB processes are running.
However, I replaced the function in the example https://mathworks.com/help/deeplearning/ug/image-to-image-regression-using-deep-learning.html. Three Volatile GPU util sitll remained 0% during most of the training time.
Sometimes, it shows 20% to 30%, but that is an extremely short period. I am unsure if the neural network is training using the GPUs.
Following is my code
digitDatasetPath = fullfile("my path");
imds = imageDatastore(digitDatasetPath, ...
IncludeSubfolders=true,LabelSource="foldernames");
imds.ReadSize = 1024 ;
imds = shuffle(imds);
[imdsTrain,imdsVal,imdsTest] = splitEachLabel(imds,0.95,0.025);
dsTrainNoisy = transform(imdsTrain,@addNoise);
dsValNoisy = transform(imdsVal,@addNoise);
dsTestNoisy = transform(imdsTest,@addNoise);
dsTrain = combine(dsTrainNoisy,imdsTrain);
dsVal = combine(dsValNoisy,imdsVal);
dsTest = combine(dsTestNoisy,imdsTest);
dsTrain = transform(dsTrain,@commonPreprocessing);
dsVal = transform(dsVal,@commonPreprocessing);
dsTest = transform(dsTest,@commonPreprocessing);
dsTrain = transform(dsTrain,@augmentImages);
exampleData = preview(dsTrain);
inputs = exampleData(:,1);
responses = exampleData(:,2);
minibatch = cat(2,inputs,responses);
montage(minibatch',Size=[8 2])
title("Inputs (Left) and Responses (Right)")
imageLayer = imageInputLayer([224,224,3]);
encodingLayers = [ ...
convolution2dLayer(3,16,Padding="same"), ...
batchNormalizationLayer, ...
reluLayer, ...
maxPooling2dLayer(2,Padding="same",Stride=2), ...
convolution2dLayer(3,32,Padding="same"), ...
batchNormalizationLayer, ...
reluLayer, ...
maxPooling2dLayer(2,Padding="same",Stride=2), ...
convolution2dLayer(3,64,Padding="same"), ...
batchNormalizationLayer, ...
reluLayer, ...
maxPooling2dLayer(2,Padding="same",Stride=2), ...
dropoutLayer(0.5)];
decodingLayers = [ ...
transposedConv2dLayer(2,64,Stride=2), ...
batchNormalizationLayer, ...
reluLayer, ...
transposedConv2dLayer(2,32,Stride=2), ...
batchNormalizationLayer, ...
reluLayer, ...
transposedConv2dLayer(2,16,Stride=2), ...
batchNormalizationLayer, ...
reluLayer, ...
convolution2dLayer(1,3,Padding="same"), ...
clippedReluLayer(1.0), ...
regressionLayer];
layers = [imageLayer,encodingLayers,decodingLayers];
options = trainingOptions("adam", ...
MaxEpochs=50, ...
MiniBatchSize=imds.ReadSize, ...
ValidationData=dsVal, ...
ValidationPatience=5, ...
Plots="training-progress", ...
OutputNetwork="best-validation-loss", ...
ExecutionEnvironment="gpu", ...
Verbose=true);
net = trainNetwork(dsTrain,layers,options);
modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save("trainedImageToImageRegressionNet-"+modelDateTime+".mat","net");
ypred = predict(net,dsTest);
testBatch = preview(dsTest);
idx = 1;
y = ypred(:,:,:,idx);
x = testBatch{idx,1};
ref = testBatch{idx,2};
montage({x,y});
function dataOut = addNoise(data)
dataOut = data;
for idx = 1:size(data,1)
dataOut{idx} = imnoise(data{idx},"salt & pepper");
end
end
function dataOut = commonPreprocessing(data)
dataOut = cell(size(data));
for col = 1:size(data,2)
for idx = 1:size(data,1)
temp = single(data{idx,col});
temp = imresize(temp,[224,224]);
temp = rescale(temp);
dataOut{idx,col} = temp;
end
end
end
function dataOut = augmentImages(data)
dataOut = cell(size(data));
for idx = 1:size(data,1)
rot90Val = randi(4,1,1)-1;
dataOut(idx,:) = {rot90(data{idx,1},rot90Val), ...
rot90(data{idx,2},rot90Val)};
end
end
Joss Knight
Joss Knight el 12 de Sept. de 2023
Editada: Joss Knight el 12 de Sept. de 2023
Right, so the parfor is opening a pool with a lot of workers (presumably you have a large number of CPU cores); but unfortunately these are then not used for your preprocessing during training. You need to enable DispatchInBackground as well. Try that. You should have received a warning on the first run, telling you that most of your workers were not going to be used for training.
It does look as though the general problem is that your data preprocessing is dominating the training time meaning only a small proportion of each second is being spent computing gradients, and this is what the Utilization is measuring. If DispatchInBackground doesn't help we can explore further how to vectorize your transform functions; you might also consider using augmentedImageDatastore, which provides most of what you need. Or you could preprocess data on the GPU.
Thank you for your response, Joss. Using the augmentedImageDatastore with DispatchInBackground improved the situation with a single GPU, but the results were not entirely satisfactory. Consequently, I attempted to use multiple GPUs with DispatchInBackground, but it was unsuccessful. I am aware that I used the combine function, which cannot be partitioned. Additionally, I found this link: Cannot utilize fully all GPUs during network training - MATLAB Answers - MATLAB Central (mathworks.com). It seems that I could resolve the problem by creating my own custom datastore. However, after building my custom datastore, I still encountered an error message: "Input datastore does not support DispatchInBackground with parallel or multi-gpu ExecutionEnvironment."
Here's my Custom Datstore.
I am not sure what is the problem on it.
Thank you very much for your help.
classdef CustomImageDatastore < matlab.io.Datastore & ...
matlab.io.datastore.Shuffleable & matlab.io.datastore.Partitionable
properties
Datastore % Image Datastore
NumObservations
CurrentFileIndex
ReadSize = 1; % Default value is 1
end
methods
function ds = CustomImageDatastore(folder)
% ds = CustomImageDatastore(folder) creates a datastore
% from the images in folder using the specified read function.
% Create image datastore
imds = imageDatastore(folder, ...
'IncludeSubfolders', true, ...
'LabelSource', 'none', ...
'ReadFcn', @customReadFcn);
ds.Datastore = imds;
ds.NumObservations = numel(imds.Files);
ds.CurrentFileIndex = 1;
end
function tf = hasdata(ds)
tf = ds.CurrentFileIndex <= ds.NumObservations;
end
function [data,info] = read(ds)
% [data,info] = read(ds) read one mini-batch of data.
miniBatchSize = ds.ReadSize;
info = struct;
for i = 1:miniBatchSize
img = read(ds.Datastore);
predictors{i,1} = img;
responses{i,1} = img;
ds.CurrentFileIndex = ds.CurrentFileIndex + 1;
end
data = table(predictors, responses);
end
function reset(ds)
reset(ds.Datastore);
ds.CurrentFileIndex = 1;
end
function subds = partition(ds, numPartitions, index)
% Create a copy of datastore
subds = copy(ds);
subds.Datastore = partition(ds.Datastore, numPartitions, index);
subds.NumObservations = numel(subds.Datastore.Files);
subds.reset();
end
function dsNew = shuffle(ds)
% dsNew = shuffle(ds) shuffles the files in the datastore.
% Create a copy of datastore
dsNew = copy(ds);
dsNew.Datastore = copy(ds.Datastore);
imds = dsNew.Datastore;
% Shuffle files
numObservations = dsNew.NumObservations;
idx = randperm(numObservations);
imds.Files = imds.Files(idx);
end
end
methods(Access = protected)
function n = maxpartitions(ds)
n = ds.NumObservations;
end
end
end
function img = customReadFcn(filename)
% customReadFcn Read and process an image.
%
% img = customReadFcn(filename) reads the image from the specified
% filename, resizes it to 224x224x3, and normalizes it.
% Read image
img = imread(filename);
% Resize image to 224x224x3
img = imresize(img, [224, 224]);
% If the image is grayscale, replicate it to create a 3-channel image
if size(img, 3) == 1
img = repmat(img, [1, 1, 3]);
end
% Normalize image to [0, 1]
img = double(img) / 255;
end

Iniciar sesión para comentar.

 Respuesta aceptada

aditi bagora
aditi bagora el 25 de Sept. de 2023
The error message indicates that there is an issue while distributing the data parallelly in the background. To fix the issue, the class "CustomImageDatastore" needs to implement an additional class "matlab.io.datastore.Subsettable." which will support parallel and multi-GPU environment.
For further details, refer the below documentation link.
Hope this helps you in solving the error.

Más respuestas (0)

Productos

Versión

R2023a

Preguntada:

el 8 de Sept. de 2023

Comentada:

el 26 de Sept. de 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by