# Splitting a matrix according to there labels

5 views (last 30 days)
NotA_Programmer on 10 May 2022
Commented: Jon on 11 May 2022
I have a matrix of (1900 x 4 double), fourth column contains labels 3, 2 and 1. I want to split this data in 20:80 ratio of A and B where A contains 20% of each labels 3,2,&1. And B contains 80% of each labels i.e. 80% of label 3, 80% of label 2 and 80% of label 1. Please help how can this be achieved.
dpb on 10 May 2022
Add splitapply or if using table rowfun to above...

Jon on 10 May 2022
Edited: Jon on 10 May 2022
This is one way to do it
% make an example data file with last column having either a "label" of 1,
% 2, or 3
data = [rand(1900,3),randi(3,[1900,1])];
% loop through labels making training and validation data sets
Aparts = cell(3,1);
Bparts = cell(3,1);
for k = 1:3
% get the indices of the rows with kth label
idx = find(data(:,4)==k);
numWithLabel = numel(idx);
idxrand = idx(randperm(numWithLabel)); % randomize the selection
% randomly put (within rounding) 80% in training, 20% in validation
numTrain = round(0.8*numWithLabel);
Aparts{k} = data(idxrand(1:numTrain),:);
Bparts{k} = data(idxrand(numTrain+1:end),:); % the rest go to validation
end
% put all of the parts in one matrix of doubles
A = cell2mat(Aparts);
B = cell2mat(Bparts);
Jon on 11 May 2022
@dpb Thanks I realize I need to get more familiar with categorical variables. From your example, and I think another one I saw recently I see that they provide some powerful capabilities.

dpb on 10 May 2022
Edited: dpb on 10 May 2022
[ix,idx]=findgroups(X(:,4)); % get grouping variable on fourth column X
for i=idx.' % for each group ID (must be numeric as here)
I=I(find(ix==i)); % the indices into X for the group
N=numel(I); % how many in this group
I=I(randperm(N)); % rearrange randomly the elements of index vector
nA=floor(0.8*N); % how many to pick for A (maybe round() instead???)
iA{i}=I(1:nA); % the randomized selection for A
iB{i}=I(nA+1:end); % rest for B
end
NotA_Programmer on 10 May 2022
-cleanest would be to copy and paste the actual code instead of retyping; then you also get indenting and comments and all... :)
Yeah, I should have done it in tha way.

R2021a

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!