Retime data aggregation for ID

1 visualización (últimos 30 días)
Gian23
Gian23 el 26 de Abr. de 2021
Respondida: Eric Sofen el 4 de Mayo de 2021
Hello everyone,
I'm going to calculate the monthly average of daily temperatures grouping by sensor using retime function. At the moment I'm trying to select each sensor with a loop and then apply the retime function, but I have to calculate six million rows so I would like to avoid a loop to speed up the calculation
I give an example:
Time = datetime({'18-feb-2021';'19-feb-2021';...
'01-mar-2021';'04-mar-2021';'18-feb-2021';...
'19-feb-2021';'01-mar-2021';'04-mar-2021'});
Temp = [56.82;62.72;64.52;63.81;63.45;59.7;60.27;61.32];
Sensor = [12;12;12;12;13;13;13;13];
TT = timetable(Time,Sensor,Temp);
Current code:
uni_sensor = unique(Sensor);
monthly_ds = timetable;
parfor kk = 1:length(uni_sensor)
index_retime = find(TT.Sensor == uni_sensor(kk));
sensor_retime = TT(index_retime,:);
monthly_data = retime(sensor_retime(:,2), 'monthly', 'mean');
data_sensor_retime = array2table(repmat(uni_sensor(kk), size(monthly_data, 1),1), 'VariableNames', "Sensor" );
monthly_sub_id = [monthly_data, data_sensor_retime];
monthly_ds = [monthly_ds; monthly_sub_id];
end
Desired output:
Time = datetime({'feb-2021';'feb-2021';...
'mar-2021';'mar-2021'});
Temp = [59.77;61.575;64.165;60.795];
Sensor = [12;12;13;13];
TT_out = timetable(Time,Sensor,Temp);
Thanks in advance,
Gianluca

Respuesta aceptada

Marco Riani
Marco Riani el 27 de Abr. de 2021
I think in this example it is unnecessary to use retime.
I would proceed as follows.
Time1=char(Time);
Time2=Time1(:,4:end);
TT = table(findgroups(string(Time2)),Sensor,Temp);
groupvars={'Sensor' 'Var1'};
datavars='Temp';
groupsummary(TT,groupvars,'mean',datavars)
Instead of using groupsummary it is possible to use grpstats. Please let us know which between groupsummary and grpstats is faster.
  1 comentario
Gian23
Gian23 el 27 de Abr. de 2021
Very interesting approach and usefull overall! I tried yor solution and it works very well.
On the evidence of one test, grpstats is faster than groupsummary.
Many thanks again!

Iniciar sesión para comentar.

Más respuestas (1)

Eric Sofen
Eric Sofen el 4 de Mayo de 2021
Another approach is to unstack the timetable based on the sensor ID, so you'd have a wide timetable with separate variables temp_12, temp_13, ..., then apply retime to that without a need for grouping. I don't know if it would be faster than Marco's findgroups approach (which is quite clever), and having the sensor IDs embedded in the table variable names may or may not be useful in the long run, but it's yet another way to tackle this problem.

Categorías

Más información sobre Data Type Identification en Help Center y File Exchange.

Etiquetas

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by