# Removing Rows Based on Repeated Values in a Single Column of Matlab Table

40 views (last 30 days)
Sarvesh Kumar on 17 Jul 2018
Answered: Peter Perkins on 3 Aug 2018
Hi. I have some Tables Containing 62x5300(RxC) data points which i am trying to analyze using matlab. I have a column named as b which has repeated values. i want matlab to read the values in b, and if it is repeated then matlab should find the mean of all the rows corresponding to that value of b.
for example my table is as follows:
a b c d e...bj
1 2 3 4 5...6
3 2 5 3 2...7
3 4 6 7 3...4
4 2 6 7 9...3
after i run the table through matlab, I should get the table below:
a b c d e...bj
4 2 7 7 8...8 ---[Mean Values of R1,2,4]
3 4 6 7 3...4 ---[R3 (4 was not repeated anywhere in Col 2)]
As you can see that col b had 2 repeated 3 times in Row1,2,4, after running through matlab, the results are in table 2 where the mean values of each repeated column has been found and placed along with the unique row(Row3).
Note that i want to work with tables and not matrix as i find it easier to refer to the columns having heading.
i would appreciate if someone can help me with this.
Thanks.

Diwakar Ravichandran on 17 Jul 2018
Hi Sarvesh,
As I understand, you want to search for the value of element in column 'b' and make the second row of your table as the mean of all the values in which the rows are being repeated. This process is easier to do if you are using matrices rather than tables. Never the less. I would suggest you to look into the find function in matlab which can help you look for the repetitions in column 'b'. The documentation for find function is as follows:
Also a point worth noting, in the example you mentioned, in the 2nd row, the values are sum of R1,2,4 divided by 2, and not the mean value.
Also a nice workaround would be to import the table as a matrix, perform your calculations and write the values back to the same or another table as per your choice. But this is only optional.
Hope this helps,
Cheers!

KSSV on 17 Jul 2018
data = [1 2 3 4 5
3 2 5 3 2
3 4 6 7 3
4 2 6 7 9.] ;
a = data(:,1) ; b = data(:,2) ; c = data(:,3) ; d = data(:,4) ; e = data(:,5) ;
T = table(a,b,c,d,e) ;
Tc = table2array(T) ;
[c,ia,ib] = unique(Tc(:,2)) ;
C = zeros(length(ia),5) ;
for i = 1:length(ia)
C(i,:) = mean(Tc(ib==i,:)) ;
end
Tnew = array2table(C, 'VariableNames',T.Properties.VariableNames)
##### 2 CommentsShowHide 1 older comment
Sarvesh Kumar on 18 Jul 2018
i was able to write a code using some of the lines from your script and a script from an earlier similar question i asked. i just want someone to verify if it is correct.
[uni,~,idx4] = unique(e3(:,2)); %second column to be unique
val1 = accumarray(idx4,e3(:,1),[], @max);
val2 = accumarray(idx4,e3(:,3),[], @max);
val3 = accumarray(idx4,e3(:,4),[], @max);
%use @ to change e.g. @mean @sum etc.
%i also wanted the max values of col1,3,4 only.
new1= [val1,uni,val2,val3];
%put all the max variables together in columns
%the remaining columns must follow the mean procedure and keep on adding to new1 columns so that i get nx62 array.
for i=5:62
val4= accumarray(idx4,e3(:,i),[], @mean);
new1=[new1,val4];
end

Peter Perkins on 3 Aug 2018
Use varfun on the table, with b as the grouping variable. It's one line.
>> t = table([1;3;3;4],[2;2;4;2],[3;5;6;6],[4;3;7;7],[5;2;3;9],[6;7;4;3],'VariableNames',{'a' 'b' 'c' 'd' 'e' 'bj'})
t =
4×6 table
a b c d e bj
_ _ _ _ _ __
1 2 3 4 5 6
3 2 5 3 2 7
3 4 6 7 3 4
4 2 6 7 9 3
>> tMean = varfun(@mean,t,'GroupingVariable','b')
tMean =
2×7 table
b GroupCount mean_a mean_c mean_d mean_e mean_bj
_ __________ ______ ______ ______ ______ _______
2 3 2.6667 4.6667 4.6667 5.3333 5.3333
4 1 3 6 7 3 4
I have no idea where the numbers you posted come from, certainly 4 is not the mean of [1 3 4].
There are other functions for this too: findgroups/splitapply, and in R2018a, groupsummary.