Treat and handle missing hourly data (with daily profile), that might have large gaps

2 visualizaciones (últimos 30 días)
I want to treat huge missy temperature data with many missing values (presented as 999.9).
If there is few missing data within the day, I would take average from data before and after. But if I have large missing clusters (almost full-day missing, or up to 100 values in a row), I would take average of 1PM temperature from yesterday and 1PM temperature from tomorrow to get 1PM value for today, and same goes for all hours.
Note: I don't wish to change valid assigned tempratures linked to hours (like what interp1 would do with values order).
What can I use to handle these data?
08/09/2016 4:00:00 26
08/09/2016 5:00:00 26
08/09/2016 6:00:00 25
08/09/2016 6:00:00 999.9
08/09/2016 7:00:00 24
08/09/2016 8:00:00 25
08/09/2016 9:00:00 24
08/09/2016 9:00:00 999.9
08/09/2016 10:00:00 23
  5 comentarios
Anwaar Alghamdi
Anwaar Alghamdi el 24 de Nov. de 2022
Also, if I do linear interpolation, the non-999 values will be missed up (at least their order). I don't want to touch the temperatures assigned for each hour. Only estimate the 999 values.
Jiri Hajek
Jiri Hajek el 24 de Nov. de 2022
As for the cluster identification, I can give you some hints - will put them below into an answer. As for the handling of large missing clusters, I would leave themo out, i.e. constrain the scope.

Iniciar sesión para comentar.

Respuestas (1)

Jiri Hajek
Jiri Hajek el 24 de Nov. de 2022
To identify the clusters of outliers, one may use logical indexing and the time vector. This is just a skeletal draft of the algorithm, but you can get the idea.
timeColumn % your datatime values
temperatureColumnRaw % your original temperatures
outlierPoints = temperatureColumnRaw > 900;
outlierTimes = timeColumn(outlierPoints);
timeDifsOfOutliers = diff(outlierTimes);
clusterStartsLogical = [1; timeDifsOfOutliers > mode(diff(timeColumn))];
clusterStartTimes = outlierTimes(clusterStartsLogical);
nClusters = length(clusterStart);
if nClusters > 1
clusterStartIndices = find(clusterStartsLogical);
clusterEndPoints = [clusterStartIndices(2:end)-1;length(outlierTimes)];
clusterEndTimes = outlierTimes(clusterEndPoints);
end
clusterDurations = clusterEndTimes-clusterStartTimes;
shortClusterIndices = clusterDurations > hours(3); % you define, what is a short cluster

Categorías

Más información sobre Call Python from MATLAB en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by