Fill a timetable outside the gaps

14 views (last 30 days)
Consider the following timetable, which is just a toy to explain the problem
dates = [datetime(2020,1,1) datetime(2020,1,1) datetime(2020,1,2) datetime(2020,1,2) datetime(2020,1,3) datetime(2020,1,3) datetime(2020,1,4)];
t = timetable(dates',["a" "c" "a" "b" "a" "a" "c"]',[2 1 3 2 1 2 1]')
Time Var1 Var2
___________ ____ ____
01-Jan-2020 "a" 2
01-Jan-2020 "c" 1
02-Jan-2020 "a" 3
02-Jan-2020 "b" 2
03-Jan-2020 "a" 1
03-Jan-2020 "a" 2
04-Jan-2020 "c" 1
For each date and for each unique string in Var1 I would like to obtain the sum of the values in Var2. That is:
(1) if for a specific date a string does not appear, then 0 must be returned
(2) if for a specific date a string appears only once, then the corresponding value in Var2 must be returned
(3) if for a specific date a string appears multiple times, then the sum of the corresponding values in Var2 must be returned
The function retime satisfy (2) and (3) completely, but (1) only partially in the sense that it returns 0 only if the string appears in a previous date AND in a following date, that is retime only fills the gaps.
In the following the row with 04-Jan-2020 and 0 is missing
Time Var2
___________ ____
01-Jan-2020 2
02-Jan-2020 3
03-Jan-2020 3
In the following the rows with dates 1, 3, 4 january (and 0 in Var2) are missing
Time Var2
___________ ____
02-Jan-2020 2
The following is correct
Time Var2
___________ ____
01-Jan-2020 1
02-Jan-2020 0
03-Jan-2020 0
04-Jan-2020 1
Is there an easy way to resolve the problem and without using for loops? The data I'm working with has thousands of rows and tens of columns, I tried with loops but it's time consuming.
In the toy example an easy workaround is to manually add the missing strings at the start and at the end of the timetable, in such a way we can extend the gaps to cover all the datetimes and then retime can fill all the missing values.
However, this is easy only for timetables with a simple structure, moreover it might have some unwanted side effects
t = [timetable(datetime(2020,1,1),"b",0) ; t ; timetable([datetime(2020,1,4);datetime(2020,1,4)],["a";"b"],[0;0])]
Time Var1 Var2
___________ ____ ____
01-Jan-2020 "b" 0
01-Jan-2020 "a" 2
01-Jan-2020 "c" 1
02-Jan-2020 "a" 3
02-Jan-2020 "b" 2
03-Jan-2020 "a" 1
03-Jan-2020 "a" 2
04-Jan-2020 "c" 1
04-Jan-2020 "a" 0
04-Jan-2020 "b" 0

Accepted Answer

Siddharth Bhutiya
Siddharth Bhutiya on 19 May 2021
You could use groupsummary to do this.
In your case your grouping variables would be Time and Var1 and the aggregation method you want to use would be sum. Since you want all permutations of the grouping variables to show up in your output, you can specify the "IncludeEmptyGroups" as true and that should give you the desired output.
ans =
12×4 table
Time Var1 GroupCount sum_Var2
___________ ____ __________ ________
01-Jan-2020 "a" 1 2
01-Jan-2020 "b" 0 0
01-Jan-2020 "c" 1 1
02-Jan-2020 "a" 1 3
02-Jan-2020 "b" 1 2
02-Jan-2020 "c" 0 0
03-Jan-2020 "a" 2 3
03-Jan-2020 "b" 0 0
03-Jan-2020 "c" 0 0
04-Jan-2020 "a" 0 0
04-Jan-2020 "b" 0 0
04-Jan-2020 "c" 1 1
giannit on 22 May 2021
Wow perfect! I hoped for a one line command but I was lost in the documentations, thank you very much for the big help!
To convert the table to a timetable, is it fine to do this?
T = timetable(datetime(2020,1,[1 1 1 3 3 4])',["a" "c" "b" "a" "a" "c"]',[2 1 2 1 2 1]');
t = groupsummary(T,["Time","Var1"],["day","none"],"sum","Var2","IncludeEmptyGroups",true);
t.day_Time = datetime(string(t.day_Time));
tt = table2timetable(t)
thanks again!

Sign in to comment.

More Answers (2)

dpb on 14 May 2021
Edited: dpb on 14 May 2021
Illustrate the concept --
tt.Var1=categorical(tt.Var1); % convert Var1 to categorical as rightfully is
dRef=datetime(tt.Time(1):tt.Time(end)).'; % get the full date vector
vRef=unique(tt.Var1); % and the unique values of Var1
% build a reference timetable of all times/characteristics
ttRef=timetable(reshape(repmat(dRef.',size(vRef,1),1),[],1), repmat(vRef,size(dRef,1),1));
ttRef.Var2=zeros(height(ttRef),1); % add Var2 column of zeros
[~,ib]=ismember(tt(:,{'Var1'}),ttRef(:,{'Var1'})); % locate the ones that are present in reference
ttRef.Var2(ib)=tt.Var2; % and replace with start
The above for the example leads to a fully-augmented timetable of:
ttRef =
12×2 timetable
Time Var1 Var2
___________ ____ ____
01-Jan-2020 a 2
01-Jan-2020 b 0
01-Jan-2020 c 1
02-Jan-2020 a 3
02-Jan-2020 b 2
02-Jan-2020 c 0
03-Jan-2020 a 2
03-Jan-2020 b 0
03-Jan-2020 c 0
04-Jan-2020 a 0
04-Jan-2020 b 0
04-Jan-2020 c 1
for which the previous rowfun solution will work
  1 Comment
dpb on 15 May 2021
NB: The above construction left out the insertion of the duplicated initial rows...remember to reinsert them as well.

Sign in to comment.

dpb on 14 May 2021
K>> rowfun(@sum,t,"InputVariables",'Var2','GroupingVariables',{'Time','Var1'})
ans =
6×3 timetable
Time Var1 GroupCount Var3
___________ ____ __________ ____
01-Jan-2020 "a" 1.00 2.00
01-Jan-2020 "c" 1.00 1.00
02-Jan-2020 "a" 1.00 3.00
02-Jan-2020 "b" 1.00 2.00
03-Jan-2020 "a" 2.00 3.00
04-Jan-2020 "c" 1.00 1.00
I'd just fill in the missing dates using ismember or setdiff to create the vector of missing dates as compared to a full vector from the first to last date in the timetable. That's still a set of vector operations on the resulting table.
Is there to be a "0" entry for each value of Var1, too, I suppose?
I didn't try to write it, but seems you could use retime and custom function to fill the missing entries in the original timetable with zeros for each missing date/category; perhaps in conjunction with rowfun to using the grouping variable to cover the categories.
All of these, of course, have the looping construct in them, just at a lower level...


Find more on Dates and Time in Help Center and File Exchange




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by