Splitting a matrix according to there labels
7 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
NotA_Programmer
el 10 de Mayo de 2022
Comentada: Jon
el 11 de Mayo de 2022
I have a matrix of (1900 x 4 double), fourth column contains labels 3, 2 and 1. I want to split this data in 20:80 ratio of A and B where A contains 20% of each labels 3,2,&1. And B contains 80% of each labels i.e. 80% of label 3, 80% of label 2 and 80% of label 1. Please help how can this be achieved.
6 comentarios
Respuesta aceptada
Jon
el 10 de Mayo de 2022
Editada: Jon
el 10 de Mayo de 2022
This is one way to do it
% make an example data file with last column having either a "label" of 1,
% 2, or 3
data = [rand(1900,3),randi(3,[1900,1])];
% loop through labels making training and validation data sets
Aparts = cell(3,1);
Bparts = cell(3,1);
for k = 1:3
% get the indices of the rows with kth label
idx = find(data(:,4)==k);
numWithLabel = numel(idx);
idxrand = idx(randperm(numWithLabel)); % randomize the selection
% randomly put (within rounding) 80% in training, 20% in validation
numTrain = round(0.8*numWithLabel);
Aparts{k} = data(idxrand(1:numTrain),:);
Bparts{k} = data(idxrand(numTrain+1:end),:); % the rest go to validation
end
% put all of the parts in one matrix of doubles
A = cell2mat(Aparts);
B = cell2mat(Bparts);
13 comentarios
dpb
el 11 de Mayo de 2022
Editada: dpb
el 11 de Mayo de 2022
Oh, if you want categorical labels, then use categorical variables -- that's what its for...
labels=randi(3,10,1); % dummy dataset for show...
labels=categorical(labels,[1:3],{'Good','Average','Bad'},'ordinal',1); % convert to categorical
labels =
10×1 categorical array
Bad
Good
Bad
Average
Average
Bad
Bad
Good
Bad
Good
>>
Plots are aware of categorical variables so you get the labels automagically; you may have to use
>> categories(labels)
ans =
3×1 cell array
{'Good' }
{'Avgerage'}
{'Bad' }
>>
or string or cellstr occasionally to get a string representation if need it specifically.
But, manipulating table data as categorical instead of as string is far easier and more effiicient besides.
While I showed as a standalone new variable called labels, what you really want to do is convert the actual variable to categorical and use it instead of the original...then the labels come along for free.
Jon
el 11 de Mayo de 2022
@dpb Thanks I realize I need to get more familiar with categorical variables. From your example, and I think another one I saw recently I see that they provide some powerful capabilities.
Más respuestas (1)
dpb
el 10 de Mayo de 2022
Editada: dpb
el 10 de Mayo de 2022
[ix,idx]=findgroups(X(:,4)); % get grouping variable on fourth column X
for i=idx.' % for each group ID (must be numeric as here)
I=I(find(ix==i)); % the indices into X for the group
N=numel(I); % how many in this group
I=I(randperm(N)); % rearrange randomly the elements of index vector
nA=floor(0.8*N); % how many to pick for A (maybe round() instead???)
iA{i}=I(1:nA); % the randomized selection for A
iB{i}=I(nA+1:end); % rest for B
end
5 comentarios
dpb
el 10 de Mayo de 2022
Editada: dpb
el 10 de Mayo de 2022
You've got a missing ".'" transpose operator on the for loop iterator -- it must be a row vector; passing a column vector will result in the problem that all three indices are passed at once. I could have made the code more robust by writing
for i=idx(:).'
instead which (:) forces a column vector and ".'" turns it into row.
However, I see I missed an important step in the cleanup from the anonymous function version -- the line
I=randperm(N);
needs to be
I=I(randperm(N));
to rearrange the subset indices to the grouped variables; the randperm(N) call simply generates the right length of vector subscripts in a random order; still need the actual subscripts from the matching operation of finding the ones in the given group.
With those corrections, it should work as is...cleanest would be to copy and paste the actual code instead of retyping; then you also get indenting and comments and all... :)
I did make the above correction in the Answer code...sorry I missed that first time; glad there was another issue that you reposted so had the chance to see it! :)
Ver también
Categorías
Más información sobre Interactive Control and Callbacks en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!