Filtering max values within every 3 hours of data
Mostrar comentarios más antiguos
Hey everyone,
I attached the first 50 rows of a sample dataset. My time data is in the format of '2000-01-01 01:00' where the last 5 characters hold the hour (HH) and minute (MM) values respectively. I would like to have a code that goes through the time column, and assigns the maximum value of the second column between the 01,02,03 hours to the 03 hour timestep. I need to then have this done for the 04 - 06 time period and so forth until 22 - 00.
At the end the cell array would not need to keep data from other columns and should just be XXXX x 2 cell array and just reassigns the maximums to the 03,06,09 ...... 00 timesteps. There may also be missing hours, so the script would need to accommodate for that.
example:
'1984-01-01 00:00' 10.80 240 20.06 231 0.8
'1984-01-01 01:00' 11.83 230 20.06 6 0.6
'1984-01-01 02:00' 11.31 230 18.51 100 0.6
'1984-01-01 03:00' 10.80 230 18.51 212 0.7
'1984-01-01 04:00' 9.77 240 18.65 348 0.8
'1984-01-01 05:00' 9.25 240 16.97 406 0.8
'1984-01-01 06:00' 9.77 230 17.48 524 0.7
'1984-01-01 07:00' 9.77 230 16.97 612 0.7
would then become:
'1984-01-01 03:00' 11.83
'1984-01-01 06:00' 9.77
I am completely lost in writing this and any help would be appreciated!
1 comentario
Cedric
el 11 de Mayo de 2014
Well, not all answers below will work with missing data, but if you work for understanding them all (with their limitations), you'll learn a lot about data manipulation in MATLAB ;-)
Respuesta aceptada
Más respuestas (2)
dpb
el 11 de Mayo de 2014
An alternate approach...
>> dat=circshift(dat(:,1:2),-1); % place the first of groups of three on first row
>> ddat=cell2mat(dat(:,2)); % convert numeric to array for convenience(+)
>> [~,imx]=max(reshape(ddat(1:fix(length(dat)/3)*3),3,[])); % find max for each group
>> res=dat(imx+[0:3:(length(imx)-1)*3],1:2) % select those from full array
res =
'1984-01-01 01:00' [11.8300]
'1984-01-01 04:00' [ 9.7700]
>>
I didn't read Cedric's answer carefully enough to unequivocally answer your question re: missing values; I'd suggest trying it for a sample dataset and see how it works.
For correct operation of the above you will need to fill in missing data "holes" if they do exist similarly as to how Cedric built a date vector from your first to last to ensure the "divide by 3" works to build the right number of columns.
Andrei Bobrov
el 11 de Mayo de 2014
Editada: Andrei Bobrov
el 11 de Mayo de 2014
load upload.mat
a = datatoupload;
[~,~,~,hh] = datevec(a(:,1));
hh(hh==0) = 24;
h2 = (rem(hh-1,3)+1)==1;
h2(1) = 1;
ii = cumsum(h2);
[~,b] = unique(ii,'last');
out = [a(b,1) accumarray(ii,cat(1,a{:,2}),[],@(x){max(x)})];
add variant
t = datenum(1984,1,1,3*(0:ceil(diff(datenum(a([1,end],1)))*24/3))',0,0);
t0 = datenum(a(:,1));
t1=t+eps(1e6);
[~,ii] = histc(t0,t1);
out = [cellstr(datestr(t,'yyyy-mm-dd HH:MM')),...
accumarray(ii+1,cat(1,a{:,2}),[],@(x){max(x)})];
Categorías
Más información sobre Statistics and Machine Learning Toolbox en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!