Computing average absolute time from date strings without years
Mostrar comentarios más antiguos
Hi there, I have a very large table with a column of (12 hr) times saved as strings in a cell array, and a column of folder names also saved as strings in a cell array. So for example, say my table was called 'data', then if I type:
K>> data.folder
ans =
1150×1 cell array
{'160115/f1/f2/f3'}
I can access a string directly by typing:
K>> data.folder{1}
ans =
'160115/f1/f2/f3'
The first part of the folder string has the date in YYMMDD format. Another column in the table has the time in a string (12 hour clock):
K>> data.end_time{1}
ans =
'11:24:17 AM'
The actual table is enormous. I have some index for each row which fits into some supercategory (eg taken on some date), and I want to average the absolute time for each row in that category. How can I do this?
I have a brute force solution for the first part which works but I don't like how it's a loop. I would prefer to do it in a more compact way. Anyways, here it is:
exact_times = datetime(ramsey_data.end_time);
date = ramsey_data.folder;
for i=1:numel(exact_times) ymd = date{i};
ymd = ymd(1:6);
year = ['20' ymd(1:2)];
month = ymd(3:4);
day = ymd(5:6);
exact_times(i).Year = str2num(year);
exact_times(i).Month = str2num(month);
exact_times(i).Day = str2num(day);
end
I still need to find a way to average the resulting values in exact_times properly. I want to select only some of the rows in the table to average. If I could use accumarray I would use something like: average_time = accumarray(subs, exact_times, [], @mean); but I can't use that on datetime variables
4 comentarios
dpb
el 21 de Mzo. de 2018
"I want to average the absolute time for each row in that category."
I'm not sure I understand what this means...can you show an example of some inputs and the desired output for the same input?
Also, it would help immensely if you would attach a (small but) representative part of the dataset that folks can play with.
It strikes me that probably the thing would be to convert the time to durations, but need more problem definition to be certain. Alternatively, parse the date from field 1 and combine with time from field 2 to get the absolute date/time for each entry.
Rearranging the input to not have combined data in the fields could make the job much simpler if you have any control at all on how this is created.
dpb
el 21 de Mzo. de 2018
What's wrong with the posted solution? Just turn whatever is the selection criterion into a logical addressing vector. ismember or whatever works on datetime.
Respuesta aceptada
Más respuestas (2)
t=readtable('example.csv'); % read the data
t.folder=char(t.folder); % to char array
t.date=datetime(t.folder(:,1:6),'inputformat','yyMMdd'); % convert conglomeration to time
[~,~,~,h,m,s]=datevec(t.end_time); % get wanted h,m,s
[Y,M,D]=datevec(t.end_time); % and corresponding date
dt=datetime(Y,M,D,h,m,s) % build the composite date/time
dt =
10×1 datetime array
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:25:36
15-Jan-2016 11:25:36
>> mean(dt)
ans =
datetime
15-Jan-2016 11:24:32
>>
Could also parse the string data first and read the time as string and piece them together as text input to datetime. Will leave as "exercise for the student" :)
ADDENDUM
Per sidebar comments using your list of G:
>> splitapply(@mean,dt,G)
ans =
4×1 datetime array
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:25:09
>>
ADDENDUM 2
If the issue you're having is the nested cell array; there's another path besides that above avoiding the intermediary of the table --
>> c=textscan(fid,'%s%s','delimiter',',','headerlines',1);
>> times=char(c{1}); folder=char(c{2});
>> dt=datetime([folder(:,1:6) blanks(10).' times],'inputformat','yyMMdd hh:mm:ss a')
dt =
10×1 datetime array
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:25:36
15-Jan-2016 11:25:36
>>
splitapply with the grouping vector as above from here.
Unfortunately, Matlab doesn't have the syntax to be able to use sub-indexing of doubly-nested cellstr arrays without the intermediary variable to dereference them to singly-nested as above; there's probably the place where you're having addressing issues. (Of course, that I inadvertently deleted the conversion step in initial posting didn't help...sorry about that :P)
You don't show how you're getting the data into the app; as above textscan has the penchant to put everything into that two-level structure that is convenient for variables returned from it being succinct but not necessarily so much when comes to using the results.
6 comentarios
Use grouping variables and splitapply...I thought the problem was the combination of year with time.
You didn't explain how to get which given rows you want to average; the sample vector above seems totally random given the list of times. If that vector is generated by some other logic, then just use it...see amplified Answer on that assumption.
supernoob
el 21 de Mzo. de 2018
So, does not the amplified Answer then solve all the issues? BTW, you seem to misinterpret what t is; it's the table object created from the .csv file; folder is the auto-generated field name based on the file header content...
>> type example.csv % the file content used...
end_time,folder
11:24:17 AM,160115/baseline1/block1/ramsey1
11:24:17 AM,160115/baseline1/block1/ramsey1
11:24:17 AM,160115/baseline1/block1/ramsey1
11:24:17 AM,160115/baseline1/block1/ramsey1
11:24:17 AM,160115/baseline1/block1/ramsey1
11:24:17 AM,160115/baseline1/block1/ramsey1
11:24:17 AM,160115/baseline1/block1/ramsey1
11:24:17 AM,160115/baseline1/block1/ramsey1
11:25:36 AM,160115/baseline1/block1/ramsey1
11:25:36 AM,160115/baseline1/block1/ramsey1
>>
>> t=readtable('example.csv') % read as table...
t =
end_time folder
___________ _________________________________
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:25:36 AM '160115/baseline1/block1/ramsey1'
11:25:36 AM '160115/baseline1/block1/ramsey1'
>>
Everything in the Answer proceeds directly from there; no magic needed; any release from R2013b on is adequate.
supernoob
el 22 de Mzo. de 2018
Oh, I see when I pasted from workspace and cleaned up the superfluous outputs I inadvertently deleted one line..converting the cell array to char()...
t.folder=char(t.folder);
before the
t.date=...
step....
supernoob
el 21 de Mzo. de 2018
1 comentario
What happened to the time strings; this only looks at the dates?
Going back to your original where you show you have
K>> data.folder
ans =
1150×1 cell array
{'160115/f1/f2/f3'}
K>> data.end_time{1}
ans =
'11:24:17 AM'
then
folder=char(data.folder{:});
times=char(data.end_time{:});
would get you to the point at which I started after reading the csv file into the table I named t excepting w/ standalone variables as in the latter example instead of inside the table.
Categorías
Más información sobre Calendar en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



