if it is greater than 75 percent missing data
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Fabian Moreno
el 26 de Oct. de 2020
Respondida: Peter Perkins
el 19 de Nov. de 2020
Hello there
I have a matrix of (10200x7) from 01-01-2019 to 29-02-2020, which is iqual to the hourly temperature measured. The data is organizaed in the first 6 columns with date information (yy-mm-dd-hh-mm-ss) and then my temperature data (column 7). I already got the nanmean dayli values for a new matrix. But I would like to know how can I get a conditional, for example: if more than 75% of my data each 24 hours is missing, give me NaN for that day. If not, give me nanmean value.
I'm trying something like this, but I'm pretty sure I'm on the wrong way
%first I find the hourly values positions after 0 hs and before 23 hs
x = find(files_mat{1,1}(:,5)>= 0 {1,1}(:,5)<=23);
%%then, I'm trying to get the conditional to the tempearture column
if isnan (files_mat{1,1}(x,7))<= 0.75
M_24h
elseif isnan (files_mat{1,1}(x,7))>= 0.75
M_24h
Thank you
2 comentarios
Rik
el 26 de Oct. de 2020
I don't fully understand the shape of your data. But if you have an array, you can call isnan and get out a logical array. Using mean you can find the percentage of NaNs in each row, and using the comparator you will end up with a logical vector you can use to index into your output array and correct all non-NaNs.
I'm not posting this as an answer, as I can't really write a working example for you without understanding your data.
Respuesta aceptada
Peter Perkins
el 19 de Nov. de 2020
I think this would be much simpler if you created a timetable by combining the first 6 vars in the table. Actually, I have no idea if you even have a table, but you should. Use readtable to read in your data.
Given a timetable, this is a one-liner: use retime to aggregate to daily means, except use your own custom function that returns NaN if there's more than 18 missing hourly values. Something like this:
ttDaily = retime(ttHourly,'daily',myFun)
function y = myFun(x)
if sum(isnan(x)) > .75*length(x)
y = NaN;
else
y = mean(x,'omitnan');
end
0 comentarios
Más respuestas (1)
Adam Danz
el 26 de Oct. de 2020
Editada: Adam Danz
el 27 de Oct. de 2020
Here's an approach you can apply to your data,
% Create a table of year-month-day-hour-min-second-rain data with some of the hour data missing (NaN)
dateTimeVals = datetime(1999,1,1) + hours(1:5000)';
rain = rand(size(dateTimeVals));
T = array2table([datevec(dateTimeVals), rain], 'VariableNames',{'Y','M','D','H','Mn','S','rain'});
% Replace some Hour data with missing vals
rng('default')
T.H(randi(height(T),1,3000)) = NaN;
% Show top of table
head(T)
% Add column to identify valid hour-data
T.dateValid = ~isnan(T.H);
head(T)
% Compute the percentage of valid hour-data per day
% This is achieved by applying mean() to the binary
% 'dateValid' column for each day.
Tgs = groupsummary(T,{'Y','M','D'},'mean', 'dateValid');
head(Tgs)
% This shows that for the first day of the first month of 1999,
% 47.8% of hour-data are non-nan. Let's confirm that manually.
daySelection = T.Y==1999 & T.M==1 & T.D==1;
mean(T.dateValid(daySelection),'omitnan') * 100
% To identify which days have >75% hour-data,
Tgs(Tgs.mean_dateValid > 0.75, 1:3)
0 comentarios
Ver también
Categorías
Más información sobre Data Preprocessing en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!