Treat and handle missing hourly data (with daily profile), that might have large gaps

Question

Anwaar Alghamdi el 24 de Nov. de 2022

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/1861273-treat-and-handle-missing-hourly-data-with-daily-profile-that-might-have-large-gaps

Editada: Anwaar Alghamdi el 24 de Nov. de 2022

I want to treat huge missy temperature data with many missing values (presented as 999.9).

If there is few missing data within the day, I would take average from data before and after. But if I have large missing clusters (almost full-day missing, or up to 100 values in a row), I would take average of 1PM temperature from yesterday and 1PM temperature from tomorrow to get 1PM value for today, and same goes for all hours.

Note: I don't wish to change valid assigned tempratures linked to hours (like what interp1 would do with values order).

What can I use to handle these data?

08/09/2016 	4:00:00	 26
08/09/2016 	5:00:00	 26
08/09/2016 	6:00:00	 25
08/09/2016 	6:00:00	 999.9
08/09/2016 	7:00:00	 24
08/09/2016 	8:00:00	 25
08/09/2016 	9:00:00	 24
08/09/2016 	9:00:00	 999.9
08/09/2016 	10:00:00 23

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Anwaar Alghamdi el 24 de Nov. de 2022

@Jiri Hajek

Also, if I do linear interpolation, the non-999 values will be missed up (at least their order). I don't want to touch the temperatures assigned for each hour. Only estimate the 999 values.

Jiri Hajek el 24 de Nov. de 2022

As for the cluster identification, I can give you some hints - will put them below into an answer. As for the handling of large missing clusters, I would leave themo out, i.e. constrain the scope.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Jiri Hajek el 24 de Nov. de 2022

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1861273-treat-and-handle-missing-hourly-data-with-daily-profile-that-might-have-large-gaps#answer_1110293

Abrir en MATLAB Online

To identify the clusters of outliers, one may use logical indexing and the time vector. This is just a skeletal draft of the algorithm, but you can get the idea.

timeColumn  % your datatime values
temperatureColumnRaw % your original temperatures
outlierPoints = temperatureColumnRaw > 900;
outlierTimes = timeColumn(outlierPoints);
timeDifsOfOutliers = diff(outlierTimes);
clusterStartsLogical = [1; timeDifsOfOutliers > mode(diff(timeColumn))];
clusterStartTimes = outlierTimes(clusterStartsLogical);
nClusters = length(clusterStart); 
if nClusters > 1
    clusterStartIndices = find(clusterStartsLogical);
    clusterEndPoints = [clusterStartIndices(2:end)-1;length(outlierTimes)];
    clusterEndTimes = outlierTimes(clusterEndPoints);
end
clusterDurations = clusterEndTimes-clusterStartTimes;
shortClusterIndices = clusterDurations > hours(3);   % you define, what is a short cluster

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Treat and handle missing hourly data (with daily profile), that might have large gaps

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Treat and handle missing hourly data (with daily profile), that might have large gaps

5 comentarios Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos