# how to identify unique columns, based on the first two rows, make an average of the uniques columns and remove the duplicate columns?

8 views (last 30 days)
ehsan on 11 Jun 2019
Answered: Peter Perkins on 12 Jun 2019
I have a matrix with the size of 572 x 790. The first two rows of the matrixes are Longitude and Latitude for some stations, and the other rows are the measurments values. Now I would like to calculate the mean value of the two or more columns that have similar Lon and Lat (the first to rows). Then delete those columns' values and insert the derived mean value instead of them.
10.05 9.75 9.85 9.85 9.75 9.55 9.85
47.15 47.15 47.25 47.25 47.25 47.34 47.25
0 0 0 0 0 0 2.56
1.215 5.599 0 0 0 0 0
0.547 0.553 2.058 2.058 1.422 3.063 2.664
10.94 8.98 8.295 8.295 7.058 4.011 9.24
0 0 0 0 0 0 1
0 0 0 0 0 0 0
0 0 0 0 0 0 2.5
0 0 0 0 0 0 0
4.80 2.15 1.02 1.029 0.164 0.656 0.86
0.97 7.93 1.57 1.57 8.31 3.53 1.22
1.64 0.36 0.30 0.30 0.21 0.10 0
... ... ... ... ... ... ...
For example in this case the columns number 3, 4, and 7 are unique and I need to make an average .
I am confisung a little bit with unique and accumarray.
I appreciate any help.

Star Strider on 11 Jun 2019
With ‘M’ as your matrix:
M = [10.0500 9.7500 9.8500 9.8500 9.7500 9.5500 9.8500
47.1500 47.1500 47.2500 47.2500 47.2500 47.3400 47.2500
0 0 0 0 0 0 2.5600
. . . ];
Use uniquetol (or unique), splitapply, cellfun and cell2mat to get:
[Uc,~,idx] = uniquetol(M([1 2],:)', 0.001, 'ByRows',1) % Choose A Function
[Uc,~,idx] = unique(M([1 2],:)', 'rows') % Choose A Function
colcolct = splitapply(@(x){x}, M', idx); % Collect Columns (By Row Of Transposed Matrix)
colmeans = cellfun(@(x)mean(x,1), colcolct, 'Uni',0) % Column Means (By Row Of Transposed Matrix)
Out = cell2mat(colmeans)' % Convert To Matrix & Transpose
producing:
Out =
9.5500 9.7500 9.7500 9.8500 10.0500
47.3400 47.1500 47.2500 47.2500 47.1500
0 0 0 0.8533 0
0 5.5990 0 0 1.2150
3.0630 0.5530 1.4220 2.2600 0.5470
4.0110 8.9800 7.0580 8.6100 10.9400
0 0 0 0.3333 0
0 0 0 0 0
0 0 0 0.8333 0
0 0 0 0 0
0.6560 2.1500 0.1640 0.9697 4.8000
3.5300 7.9300 8.3100 1.4533 0.9700
0.1000 0.3600 0.2100 0.2000 1.6400
I could not entice splitapply to both group the columns and calculate the means, explaining the cellfun call.

Show 3 older comments
ehsan on 11 Jun 2019
Dear Adam, Star and Guillaume,
Thanks for your prompt and precise reply and consideration. That works perfectly.
Star Strider on 11 Jun 2019
Our individual and collective pleasures!
Guillaume on 11 Jun 2019
As I've shown in my answer, you certainly don't need to go through cell arrays with splitapply. Using cell arrays is going to significantly slow down the code.

### More Answers (2)

Guillaume on 11 Jun 2019
• using unique and accumarray:
[latlong, ~, group] = unique(yourmatrix(:, [1, 2]), 'rows'); %optionally use the 'stable' option.
[destrow, destcol] = ndgrid(group, 3:size(yourmatrix, 2));
newmatrix = accumarray([destrow(:), destcol(:)], reshape(yourmatrix(:, 3:end), 1, []), [], @mean)
newmatrix(:, [1, 2]) = latlong
• using the newer findgroups and splitapply:
[group, lat, long] = findgroups(yourmatrix(:, 1), yourmatrix(:, 2));
newmatrix = splitapply(@(rows) mean(rows, 1), yourmatrix(:, 3:end), group);
newmatrix = [lat, long, newmatrix]
• But you might be better off converting your matrix into a table, particularly if columns 3:end represent different things. Using a table, what you want is very straightforward with either varfun or groupsummary:
latlongdata = array2table(yourmatrix, 'VariableNames', [{'latitude', 'longitude'}, compose('var%d', 1:size(yourmatrix, 2)-2)]); %obviously use better variable names than var1...varn for the remaining columns
newdata = varfun(@mean, latlongdata, 'GroupingVariables', {'latitude', 'longitude'})
%or
newdata = groupsummary(latlongdata, {'latitude', 'longitude'}, 'mean');
groupsummary lets you calculate several statistics at once (e.g. mean, std, max, min, sum) whereas varfun can only calculate one.

Peter Perkins on 12 Jun 2019
You may have a reason for wanting row-oriented data, but I'm gonna suggest that you transpose your data, create a table, and use a grouped varfun. It's a one-liner:
>> T = array2table(X); T.Properties.VariableNames(1:2) = ["Lat" "Lon"]
T =
7×13 table
Lat Lon X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13
_____ _____ ____ _____ _____ _____ __ __ ___ ___ _____ ____ ____
10.05 47.15 0 1.215 0.547 10.94 0 0 0 0 4.8 0.97 1.64
9.75 47.15 0 5.599 0.553 8.98 0 0 0 0 2.15 7.93 0.36
9.85 47.25 0 0 2.058 8.295 0 0 0 0 1.02 1.57 0.3
9.85 47.25 0 0 2.058 8.295 0 0 0 0 1.029 1.57 0.3
9.75 47.25 0 0 1.422 7.058 0 0 0 0 0.164 8.31 0.21
9.55 47.34 0 0 3.063 4.011 0 0 0 0 0.656 3.53 0.1
9.85 47.25 2.56 0 2.664 9.24 1 0 2.5 0 0.86 1.22 0
>> T2 = varfun(@(x) repmat(mean(x),size(x)),T,'GroupingVariables',["Lat" "Lon"])
T2 =
7×14 table
Lat Lon GroupCount Fun_X3 Fun_X4 Fun_X5 Fun_X6 Fun_X7 Fun_X8 Fun_X9 Fun_X10 Fun_X11 Fun_X12 Fun_X13
_____ _____ __________ _______ ______ ______ ______ _______ ______ _______ _______ _______ _______ _______
9.55 47.34 1 0 0 3.063 4.011 0 0 0 0 0.656 3.53 0.1
9.75 47.15 1 0 5.599 0.553 8.98 0 0 0 0 2.15 7.93 0.36
9.75 47.25 1 0 0 1.422 7.058 0 0 0 0 0.164 8.31 0.21
9.85 47.25 3 0.85333 0 2.26 8.61 0.33333 0 0.83333 0 0.96967 1.4533 0.2
9.85 47.25 3 0.85333 0 2.26 8.61 0.33333 0 0.83333 0 0.96967 1.4533 0.2
9.85 47.25 3 0.85333 0 2.26 8.61 0.33333 0 0.83333 0 0.96967 1.4533 0.2
10.05 47.15 1 0 1.215 0.547 10.94 0 0 0 0 4.8 0.97 1.64
I think that does what you're looking for. It seems to match StarStrider's output.