How to count data based on the categories

Question

Moe el 22 de Oct. de 2015

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/250440-how-to-count-data-based-on-the-categories

Editada: arich82 el 27 de Oct. de 2015

There is a Master matrix as follow that includes a unique ID (first column), different ID (second column) and last three columns (C/D/E) are included a number with a different range. (e.g. Column #3 included 1:2; column #4 included 1:2; column #5 included 1:10). I want a new matrix (like matrix OT in the following) that according to the unique ID find in the first column, count column 3 to 5 based on the different category. (e.g. ID = 1, column #3, category #1 = counted as 5, category #2 = counted as 0).

Master = [1  100680  1  2  1
36731  1  1  9
36731  1  2  9
14275  1  1  1
14275  1  2  8
14275  1  1  7
117633  1  1  6
117633  1  2  6
68599  1  1  8
31678  1  1  1
31678  1  1  8
31678  1  1  2
31678  1  2  7
44106  1  2  10]

Example of output for first ID #1

OT = [1 5 0 2 3 2 0 0 0 0 0 0 1 1 0];

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Moe el 22 de Oct. de 2015

Editada: Moe el 22 de Oct. de 2015

I guess I couldn't explain well in my initial question. My meaning of category was the variation of numbers that are available in that column. For example, in column three, there are only 2 different numbers (1 or 2), that's why when it counted, it found 5 (type 1) and 0 (type 2).

Another example, column 5, there are 10 different numbers (1 or 2 or 3 or ... or 10), that's why when it counted, it found 2 (type 1), 1 (type 8), 2 (type 9) and rest 0.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

arich82 el 22 de Oct. de 2015

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/250440-how-to-count-data-based-on-the-categories#answer_197012

Editada: arich82 el 22 de Oct. de 2015

Abrir en MATLAB Online

[Edit to include output]]

Without the toolboxes, I think you can do this with accumarray

data = [...
    1  100680  1  2  1; ...
    1  36731  1  1  9; ...
    1  36731  1  2  9; ...
    1  14275  1  1  1; ...
    1  14275  1  2  8; ...
    2  14275  2  1  7; ...
    2  117633  1  1  6; ...
    2  117633  1  2  6; ...
    2  68599  2  1  8; ...
    2  31678  1  1  1; ...
    2  31678  1  1  8; ...
    3  31678  2  1  2; ...
    3  31678  2  2  7; ...
    3  44106  2  2  10; ...
    ];
n = size(data, 1);
weights = ones(n, 1);
OT = unique(data(:, 1));
for k = 3:5
    OT = [OT, accumarray(data(:, [1, k]), weights)];
end

output:

OT =
       1     5     0     2     3     2     0     0     0     0     0     0     1     2     0
       2     4     2     5     1     1     0     0     0     0     2     1     2     0     0
       3     0     3     1     2     0     1     0     0     0     0     1     0     0     1

Note that I changed your data to more closely match your Excel data (Column 3 only had ones in your Master, but Column C had 1's and 2's; there needs to be at least one occurrence of the max index for this approach to match your desired result).

Also, Column 5 (E) clearly has two 9's for id == 1. Should OT(end - 1) == 2 instead of 1, or am I misinterpreting something?

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Moe el 26 de Oct. de 2015

Abrir en MATLAB Online

Hi arich82

Can you please tell me why your code is not working for the following data:

data = [
84617  1  1  4  1  1  2  1
84617  1  2  4  1  1  2  1
96283  1  2  1  1  1  4  4
96283  1  2  1  1  1  4  4
96283  1  2  5  1  1  2  1
96283  1  1  5  1  1  2  1
117059  2  2  1  2  1  4  7
117059  2  1  1  2  1  4  4
];

It gives the following error:

Error using horzcat
CAT arguments dimensions are not
consistent.
Error in ModeChoice (line 20)
    OT = [OT, accumarray(data(:, [1, k]),
    weights)];

arich82 el 27 de Oct. de 2015

Editada: arich82 el 27 de Oct. de 2015

Abrir en MATLAB Online

In the above code, accumarray is using the id column as the row index; this was fine when the id started at 1, and used consecutive integers.

Now, OT is initialized to [300; 301], that is, the value 300 in row 1 and the vaule 301 in row 2, but accumarray is trying to put results in the row 300 and row 301.

The solution is to use the third output from unique:

n = size(data, 1);
weights = ones(n, 1);
[OT, ~, ind_id] = unique(data(:, 1));
for k = 3:5
    OT = [OT, accumarray([ind_id, data(:, k)], weights)];
end

or, preallocating (slighly improved syntax)

cols = 3:5; % columns of interest, i.e. 'C', 'D', & 'E'
ncols = numel(cols);  
field_widths = [1, max(data(:, cols))]; % prepend 1 for id
[u_id, ~, ind_id] = unique(data(:, 1));
OT = zeros(numel(u_id), sum(field_widths)); % preallocate
n = size(data, 1);
weights = ones(n, 1);
OT(:, 1) = u_id;
for k = 1:ncols
    i0 = sum(field_widths(1:k)); % end index of previous field
    inds =  i0 + [1:field_widths(k+1)];
    OT(:, inds) = accumarray([ind_id, data(:, cols(k))], weights);
end

output:

OT =
     300     6     0     2     4     2     0     0     2     2
     301     0     2     1     1     2     0     0     0     0

Iniciar sesión para comentar.

Answer 2

Image Analyst el 22 de Oct. de 2015

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/250440-how-to-count-data-based-on-the-categories#answer_197008

Abrir en MATLAB Online

If you have the Statistics and Machine Learning Toolbox, you can use grpstats():

Master = [...
100680  1  2  1
36731  1  1  9
36731  1  2  9
14275  1  1  1
14275  1  2  8
14275  1  1  7
117633  1  1  6
117633  1  2  6
68599  1  1  8
31678  1  1  1
31678  1  1  8
31678  1  1  2
31678  1  2  7
44106  1  2  10]
statsArray = grpstats(Master, Master(:,1), 'sum')

and in the command window you'll see

statsArray =
           5      202692           5           8          28
          12      381496           6           7          36
           9      107462           3           5          19

Columns 3-5 in statsArray are the sums in columns 3-5 broken down by category number in column 1 of Master.

2 comentarios
Mostrar NingunoOcultar Ninguno

Moe el 22 de Oct. de 2015

Thanks Image Analyst. Matrix format is correct but the answer is not. I'm looking for the count instead of sum. For example, for ID 2, how many is 1 and how many is 2. If you count you will find 5 number 1 and 1 number 2 while in your code sum of this is calculated.

Image Analyst el 23 de Oct. de 2015

Abrir en MATLAB Online

Sorry, I didn't understand your definition/distinction between count and sum. If you want count, it seems to get the number of unique numbers. So you can just do

statsArray = grpstats(Master, Master(:,1), @fun)

with "fun" being defined as:

function num = fun(array)
num= length(unique(array));

It seems a lot simpler than the answer you chose, but whatever... I know I modified my answer after you had already picked a solution. If you want a :one-liner" you can still use it.

Iniciar sesión para comentar.

Answer 3

Peter Perkins el 23 de Oct. de 2015

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/250440-how-to-count-data-based-on-the-categories#answer_197017

Abrir en MATLAB Online

Another possibility, using rowfun and a table. This code:

Master = ...
    [1  100680  1  2  1
36731  1  1  9
36731  1  2  9
14275  1  1  1
14275  1  2  8
14275  1  1  7
117633  1  1  6
117633  1  2  6
68599  1  1  8
31678  1  1  1
31678  1  1  8
31678  1  1  2
31678  1  2  7
44106  1  2  10];
M = array2table(Master,'VariableNames',{'A' 'B' 'C' 'D' 'E'});
outNames = {'C1' 'C2' 'D1' 'D2' 'E1' 'E2' 'E3' 'E4' 'E5' 'E6' 'E7' 'E8' 'E9' 'E10'};
counts = rowfun(@fun, M,'GroupingVariables','A','InputVariables',{'C' 'D' 'E'},'OutputVariableNames',outNames)

... with this function:

function [varargout] = fun(C,D,E)
counts = [histc(C',1:2) histc(D',1:2) histc(E',1:10)];
varargout = num2cell(counts);

... produces this output:

counts = 
    A    GroupCount    C1    C2    D1    D2    E1    E2    E3    E4    E5    E6    E7    E8    E9    E10
    _    __________    __    __    __    __    __    __    __    __    __    __    __    __    __    ___
    1    5             5     0     2     3     2     0     0     0     0     0     0     1     2     0  
    2    6             6     0     5     1     1     0     0     0     0     2     1     2     0     0  
    3    3             3     0     1     2     0     1     0     0     0     0     1     0     0     1

There are probably better ways to provide this result, for example with a table more like this:

ans = 
    A    GroupCount    Ccounts    Dcounts       Ecounts   
    _    __________    _______    _______    _____________
    1    5             5    0     2    3     [1x10 double]
    2    6             6    0     5    1     [1x10 double]
    3    3             3    0     1    2     [1x10 double]

but the former is more or less what you seem to be asking for.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

How to count data based on the categories

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Respuesta aceptada

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Más respuestas (2)

2 comentarios
Mostrar NingunoOcultar Ninguno

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

How to count data based on the categories

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Respuesta aceptada

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Más respuestas (2)

2 comentarios Mostrar NingunoOcultar Ninguno

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

2 comentarios
Mostrar NingunoOcultar Ninguno

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos