Generate hourly timeseries data from "START,END,VALUE" data

Question

Luke el 9 de Mayo de 2022

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/1714880-generate-hourly-timeseries-data-from-start-end-value-data

Comentada: dpb el 10 de Mayo de 2022

Respuesta aceptada: dpb

exemple_Data_points.xlsx

Hello everyone,

I got the attached data file of events with the following structure:

START_DATE (dd.mm.yyyy)
START_TIME (hh:mm:ss) [24h]
END_DATE (dd.mm.yyyy) can be > than START_DATE (event goes form one day to another)
END_TIME (hh:mm:ss) [24h]
VALUE for the event [< 0]

I assume that the Value is equal distributed during the time. I would like the transform the Data in hourly data and have one sum of value for every hour. Maybe this makes it a little bit more clear.

The duration of the events can be very short (not less than one minute) and long (more than one day overnight).

Has anyone here any Idea how to do that with matlab? I tried to generate a timeseries with every possible hour and then sum if the Data points fits in that timeslot but this doesn’t really work out. I’m really looking forward to your answers.

Best wishes,

Lukas

2 comentarios
Mostrar NingunoOcultar Ninguno

dpb el 9 de Mayo de 2022

Hmmm.....this is seemingly simple problem, but I don't see a ready-build solution in MATLAB that "just works" out of the box, at least not OTOMH.

It doesn't quite fit the timetable; it might be closer to the timeseries with events, but neither is set up to handle the input form very directly.

I've got an idea that could work -- it would entail building a timetable for each even, then synchronize() all of those to build the one composite timetable. That done, then retime would be able to do the dirty for the output summation hourly.

Let me play around here a little and see if this goes anywhere...

dpb el 9 de Mayo de 2022

Abrir en MATLAB Online

There is more time on an hourly basis that doesn't have an event than there is that does so your graph above is somewhat misleading -- the following is the beginning of the above idea for the first eight (8) records in the data file (I just used head(ttT) as a working dataset after creating a time table from each row by using the start/end times and duplicating the VALUE.

tT=readtable('exemple_Data_points.xlsx');               % read as a table
tmp=head(tT);                                           % just save a few for testing ideas
tmp.RT1=tmp.START_DATE+days(tmp.START_TIME);            % got to turn into full datetimes
tmp.RT2=tmp.END_DATE+days(tmp.END_TIME);
for i=1:height(tmp)                        % turn each record into a timetable of two times
  ttT{i}=table2timetable( ...
            table([tmp.RT1(i);tmp.RT2(i)],repmat(tmp.VALUE(i),2,1), ...
                      'VariableNames',{'Time','Value'}),'RowTimes','Time');
end
ttTT=ttT{1};                       % now start with the first to combine/synchronize
for i=2:numel(ttT)                 % add all the others at their times; catenates VALUE horizontally
  ttTT=synchronize(ttTT,ttT{i});
end
ttTT.Properties.VariableNames=compose('V%d',1:numel(ttTT)); % just some short variable names
stackedplot(ttTT,'Color','k','LineWidth',3)                 % show what the first eight look like

Which shows the sizable gaps between events...

So, that raises Q? of what to do in between when we retime (which does work, btw, to fill in between the start/stop times) -- the default filler is missing value or NaN which is what shows up as empty space on above plot; a zero would give a solid line at zero for each set.

Also, how much granularity is needed? The original Q? asked for hourly; that will truncate items of less length than that with the normal synchronize method I believe although the shortest in the above data is 2 hr

>> tmp.DURATION=tmp.RT2-tmp.RT1
tmp =
  8×8 table
    START_DATE     START_TIME     END_DATE      END_TIME    VALUE            RT1                     RT2             DURATION
    ___________    __________    ___________    ________    _____    ____________________    ____________________    ________
    02-Apr-2013      0.54167     02-Apr-2013    0.66667      800     02-Apr-2013 13:00:00    02-Apr-2013 16:00:00    03:00:00
    04-Apr-2013      0.10417     04-Apr-2013    0.33333      485     04-Apr-2013 02:30:00    04-Apr-2013 08:00:00    05:30:00
    04-Apr-2013      0.10417     04-Apr-2013    0.33333      495     04-Apr-2013 02:30:00    04-Apr-2013 08:00:00    05:30:00
    05-Apr-2013       0.5625     05-Apr-2013    0.80208      288     05-Apr-2013 13:30:00    05-Apr-2013 19:15:00    05:45:00
    06-Apr-2013     0.083333     06-Apr-2013    0.33333      475     06-Apr-2013 02:00:00    06-Apr-2013 08:00:00    06:00:00
    06-Apr-2013      0.11458     06-Apr-2013       0.25      325     06-Apr-2013 02:45:00    06-Apr-2013 06:00:00    03:15:00
    06-Apr-2013         0.25     06-Apr-2013    0.33333      100     06-Apr-2013 06:00:00    06-Apr-2013 08:00:00    02:00:00
    07-Apr-2013     0.072917     07-Apr-2013    0.20833       16     07-Apr-2013 01:45:00    07-Apr-2013 05:00:00    03:15:00
>> 

is the created initial table from which the timetables were created.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

dpb el 9 de Mayo de 2022

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1714880-generate-hourly-timeseries-data-from-start-end-value-data#answer_960665

Abrir en MATLAB Online

Well, in the end it turned out not so bad after all -- and, somewhat surprisingly, it didn't seen to be too bad on performance...I thought it might really bog down but didn't seem to...

Here's the script I developed; as always, "salt to suit!"

tT=readtable('exemple_Data_points.xlsx');               % read as a table
tT.RT1=tT.START_DATE+days(tT.START_TIME);            % got to turn into full datetimes
tT.RT2=tT.END_DATE+days(tT.END_TIME);
for i=1:height(tT)                        % turn each record into a timetable of two times
  ttT{i}=table2timetable( ...
            table([tT.RT1(i);tT.RT2(i)],repmat(tT.VALUE(i),2,1), ...
                      'VariableNames',{'Time','Value'}),'RowTimes','Time');
end
ttTT=ttT{1};                       % now start with the first to combine/synchronize
for i=2:numel(ttT)                 % add all the others at their times; catenates VALUE horizontally
  ttTT=synchronize(ttTT,ttT{i});
end
ttTT.Properties.VariableNames=compose('V%d',1:numel(ttT)); % just some short variable names
ttTT=retime(ttTT,'hourly',"previous",'EndValues',0);

I did the plot then with

stairs(ttTT.Time,ttTT.Value)
ylabel('Cumulative Value')
title('Hourly Cumulative Value Events')

I didn't do exhaustive testing, but the looking I did do all looked to be reasonable things happening...I did not do the search for the short time frames to see what may have happened/been in the data overall, so that's something can look for/into.

This as you can see with the retime call above, did the fill with zeros in order to do the sums; otherwise NaN reigns supreme. nansum woud end up doing the same, however, if want to keep the missing values.

2 comentarios
Mostrar NingunoOcultar Ninguno

Luke el 10 de Mayo de 2022

Well well, thank you very much for your effort! This really is what I was looking for, and I’m really sorry that my graph wasn’t really helpful. The code will take a while for more than 54k events, but I’m sure this will work out. Thank you!

Do you have any recommendations for performance of the code? Do you think it’s a good Idea to separate the Data in, let’s say 54 Datasets so the matrix doesn’t get to big?

I will let you know if my analysis turns out something interesting. Just for your information: the data comes actually from here Netztransparenz > EnWG > Redispatch and describes the pattern of redispatch pattern in the German high voltage grid.

Best wishes!

dpb el 10 de Mayo de 2022

I thought it might be something like that...my consulting career was supporting the US electric utilities, starting out as nuclear engineer for one of the reactor vendors...

As far as performance of this way to go at, I don't really have any further ideas other than except that it is possible to inspect individual event data by the use of the first timetable (ttT) cell array which might be of some interest in certain cases, one could avoid the memory overhead of keeping those and wrap their creation into the call to synchronize instead of having two separate loops and keeping the duplicate data.

As coded, it will end up with a column in the final timetable for every event so indeed, 50K events will become quite large and possibly end up with memory limitations as is. I didn't look into it, but with some logic one could merge the non-overlapping events into a single column after each synchronization step and remove that last column to reduce the growth somewhat. Not sure what percentage that might be overall, I didn't delve into the patterns all that much although as the first few showed, those initial 8-10 events didn't overlap at all so that would be a sizable reduction on the small sample so may be worth looking into.

Alternatively, if you were to retime at each iteration, then you could build the overall summation as you went and keep only it. That would be most efficient memory-wise, but adds the computational overhead so may not be a wise choice. Perhaps one could synchronize some number of events and then retime every 100 or 500 or so...

Good luck! -- it will be interesting to see what you get in the end, keep us posted...

Iniciar sesión para comentar.

Generate hourly timeseries data from "START,END,VALUE" data

2 comentarios
Mostrar NingunoOcultar Ninguno

Respuesta aceptada

2 comentarios
Mostrar NingunoOcultar Ninguno

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Generate hourly timeseries data from "START,END,VALUE" data

2 comentarios Mostrar NingunoOcultar Ninguno

Respuesta aceptada

2 comentarios Mostrar NingunoOcultar Ninguno

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

2 comentarios
Mostrar NingunoOcultar Ninguno

2 comentarios
Mostrar NingunoOcultar Ninguno