Balance pixel labels by oversampling block locations in big images
creates a list of the block locations,
blockLocations = balancePixelLabels(
blockLocations, in the big
bigLabeledImages, that result in a class balanced
blockLocations returned is based on the number of
resolution levels of blocks,
levels, with specified block size,
balancePixelLabels function balances the big image dataset by
oversampling image regions that contain less-common labels. A balanced dataset can produce
better results when used for training workflows such as semantic segmentation in deep
Load a labeled image dataset.
dataDir = fullfile(toolboxdir('vision'),'visiondata'); imDir = fullfile(dataDir,'building'); labelDir = fullfile(dataDir,'buildingPixelLabels'); imageFileList = dir(imDir); labelFileList = dir(labelDir); imageFileList = imageFileList(3:end); labelFileList = labelFileList(3:end); pixelLabelID = [1 2 3 4]; classNames = ["sky" "grass" "building" "sidewalk"];
Count the pixel label occurrences in the labeled images.
for idx = 1:numel(labelFileList) bigImages(idx) = bigimage(imread([imDir filesep imageFileList(idx).name])); bigLabeledImages(idx) = bigimage(imread([labelDir filesep labelFileList(idx).name]),'Classes',classNames,'PixelLabelIDs',pixelLabelID); end
Set the resolution level and block size of the images.
bigimageLevel = 1; blockSize = [20 15];
bigimageDatastore from the image dataset.
blabelds = bigimageDatastore(bigLabeledImages,bigimageLevel,'BlockSize',blockSize);
Examine the pixel label occurrences of each class. The classes in the pixel label images are not balanced.
labelCounts = countEachLabel(blabelds);
Specify the number of block locations.
numObservations = 2000;
Select block locations from the labeled images to achieve class balancing.
locationSet = balancePixelLabels(bigLabeledImages,bigimageLevel,blockSize,numObservations);
bigimageDatastore using the block locations.
bimdsBalanced = bigimageDatastore(bigLabeledImages,'BlockLocationSet',locationSet);
Recalculate the pixel label occurrences of each class.
labelCountsBalanced = countEachLabel(bimdsBalanced);
Compare the original unbalanced labels and the newly balanced labels.
figure; h1 = histogram('Categories',labelCounts.Name,'BinCounts',labelCounts.PixelCount); title(h1.Parent,'Original Dataset Labels');
figure; h2 = histogram('Categories',labelCountsBalanced.Name,'BinCounts',labelCountsBalanced.PixelCount); title(h2.Parent,'Sampled Block Set Labels');
levels— Resolution levels
Resolution levels of blocks from each big image in the
bigLabeledImages object, specified as a positive integer or a
vector of positive integers that is equal to the length of the
bigLabeledImages input vector. If you specify a scalar value,
then all big labeled images supply blocks at the same resolution level.
blockSize— Block size
Block size of read data, specified as a two-element row vector of positive integers, [numrows,numcols]. The first element specifies the number of rows in the block. The second element specifies the number of columns.
numObservations— Number of block locations
Number of block locations to return, specified as a positive integer.
logical— Use new or existing pool
Use new or existing pool, specified as a numeric or logical
false. If you do not specify this input, the function uses
false. If no parallel pool is active, a new pool is opened based on
the default parallel settings. The
DataSource property of all input
bigimage objects must be valid paths
on each of the parallel workers.
To balance pixel labels, the function over samples the minority classes in the input images. The minority class is determined by calculating the overall pixel label counts for the complete dataset. The algorithm follows these steps.
The images in the input image array are divided into macro blocks, which is a
multiple of the
blockSize input value.
The function counts pixel labels for all classes in each macro block. Then, it selects the macro block with the greatest occurrences of minority classes using weighted random selection.
The algorithm uses a random block location within the selected macro block to perform oversampling. The origin of the block location must always be fully within the limits of the macro block.
The function updates the overall label counts based on the pixel label counts of the classes found for the selected macro block.
The function includes the new (oversampled) classes to compute new minority class.
This process repeats until the number of block locations processed equals the value
specified by the
numObservations input value.