A very fast way to sort datetime (in "ascend" mode)?

5 visualizaciones (últimos 30 días)
Sim
Sim el 18 de Nov. de 2022
Comentada: Sim el 30 de Nov. de 2022
A very fast way to sort datetime (in "ascend" mode)?
Here an example:
a = datetime([
'2022-10-27 00:22:50.000'
'2022-10-27 05:29:45.000'
'2022-10-27 05:32:19.400'
'2022-10-27 05:36:44.100'
'2022-10-27 05:39:26.600'
'2022-10-27 05:43:18.200'
'2022-10-27 05:49:17.400'
'2022-10-27 05:55:27.300'
'2022-10-27 05:58:07.000'
'2022-10-27 06:17:13.800'
'2022-10-27 06:41:28.700'
'2022-10-27 07:03:06.000'
'2022-10-27 07:09:49.800'
'2022-10-27 07:17:39.700'
'2022-10-27 07:35:09.000'
'2022-10-27 07:42:33.600'
'2022-10-27 07:46:50.500'
'2022-10-27 08:07:02.700'
'2022-10-27 08:26:29.600'
'2022-10-27 08:45:03.500'
'2022-10-27 08:48:09.700'
'2022-10-27 08:53:57.000'
'2022-10-27 08:59:43.400'
'2022-10-27 09:13:24.100'
'2022-10-27 09:19:25.000'
'2022-10-27 09:26:35.000'
'2022-10-27 09:29:54.000'
'2022-10-27 09:46:45.700'
'2022-10-27 10:10:59.900'
'2022-10-27 10:29:04.600'
'2022-10-27 10:31:50.100'
'2022-10-27 10:37:45.300'
'2022-10-27 10:42:35.800'
'2022-10-27 10:58:42.300'
'2022-10-27 11:03:48.900'
'2022-10-27 11:10:44.700'
'2022-10-27 11:13:11.100'
'2022-10-27 11:31:25.100'
'2022-10-27 11:56:01.200'
'2022-10-27 12:19:25.300'
'2022-10-27 12:25:12.500'
'2022-10-27 12:30:16.900'
'2022-10-27 12:49:19.100'
'2022-10-27 12:55:42.700'
'2022-10-27 12:58:18.100'
'2022-10-27 13:16:06.300'
'2022-10-27 14:00:44.700'
'2022-10-27 14:04:14.200'
'2022-10-27 14:11:54.600'
'2022-10-27 14:17:10.000'
'2022-10-27 14:42:03.300'
'2022-10-27 14:45:22.100'
'2022-10-27 15:02:07.900'
'2022-10-27 15:25:39.600'
'2022-10-27 15:44:58.300'
'2022-10-27 15:48:35.800'
'2022-10-27 15:54:23.100'
'2022-10-27 16:00:17.300'
'2022-10-27 16:19:23.700'
'2022-10-27 16:27:30.800'
'2022-10-27 16:31:30.600'
'2022-10-27 16:52:09.700'
'2022-10-27 17:16:08.800'
'2022-10-27 18:01:39.400'
'2022-10-27 18:08:24.800'
'2022-10-27 18:17:44.500'
'2022-10-27 18:25:02.500'
'2022-10-27 18:27:45.700'
'2022-10-27 18:48:32.000'
'2022-10-27 19:02:01.700'
'2022-10-27 19:24:08.300'
'2022-10-27 19:30:01.400'
'2022-10-27 19:43:02.200'
'2022-10-27 19:48:50.300'
'2022-10-27 19:55:41.900'
'2022-10-27 19:58:23.700'
'2022-10-27 20:17:15.800'
'2022-10-27 20:28:47.600'
'2022-10-27 20:49:36.900'
'2022-10-27 20:53:02.700'
'2022-10-27 21:07:12.100'
'2022-10-27 21:09:47.800'
'2022-10-27 21:50:42.300'
'2022-10-27 22:07:34.100'
'2022-10-27 22:09:18.800'])
a = 85×1 datetime array
27-Oct-2022 00:22:50 27-Oct-2022 05:29:45 27-Oct-2022 05:32:19 27-Oct-2022 05:36:44 27-Oct-2022 05:39:26 27-Oct-2022 05:43:18 27-Oct-2022 05:49:17 27-Oct-2022 05:55:27 27-Oct-2022 05:58:07 27-Oct-2022 06:17:13 27-Oct-2022 06:41:28 27-Oct-2022 07:03:06 27-Oct-2022 07:09:49 27-Oct-2022 07:17:39 27-Oct-2022 07:35:09 27-Oct-2022 07:42:33 27-Oct-2022 07:46:50 27-Oct-2022 08:07:02 27-Oct-2022 08:26:29 27-Oct-2022 08:45:03 27-Oct-2022 08:48:09 27-Oct-2022 08:53:57 27-Oct-2022 08:59:43 27-Oct-2022 09:13:24 27-Oct-2022 09:19:25 27-Oct-2022 09:26:35 27-Oct-2022 09:29:54 27-Oct-2022 09:46:45 27-Oct-2022 10:10:59 27-Oct-2022 10:29:04
% Any way faster than this one ?
tic
[~, idx2] = sort( datenum(a), 1, 'ascend');
toc
Elapsed time is 0.008225 seconds.

Respuesta aceptada

Stephen23
Stephen23 el 18 de Nov. de 2022
"Any way faster than this one"
Interestingly, sorting seems to be slightly faster with the superfluous call to DATENUM:
a = datetime(['2022-10-27 00:22:50.000';'2022-10-27 05:29:45.000';'2022-10-27 05:32:19.400';'2022-10-27 05:36:44.100';'2022-10-27 05:39:26.600';'2022-10-27 05:43:18.200';'2022-10-27 05:49:17.400';'2022-10-27 05:55:27.300';'2022-10-27 05:58:07.000';'2022-10-27 06:17:13.800';'2022-10-27 06:41:28.700';'2022-10-27 07:03:06.000';'2022-10-27 07:09:49.800';'2022-10-27 07:17:39.700';'2022-10-27 07:35:09.000';'2022-10-27 07:42:33.600';'2022-10-27 07:46:50.500';'2022-10-27 08:07:02.700';'2022-10-27 08:26:29.600';'2022-10-27 08:45:03.500';'2022-10-27 08:48:09.700';'2022-10-27 08:53:57.000';'2022-10-27 08:59:43.400';'2022-10-27 09:13:24.100';'2022-10-27 09:19:25.000';'2022-10-27 09:26:35.000';'2022-10-27 09:29:54.000';'2022-10-27 09:46:45.700';'2022-10-27 10:10:59.900';'2022-10-27 10:29:04.600';'2022-10-27 10:31:50.100';'2022-10-27 10:37:45.300';'2022-10-27 10:42:35.800';'2022-10-27 10:58:42.300';'2022-10-27 11:03:48.900';'2022-10-27 11:10:44.700';'2022-10-27 11:13:11.100';'2022-10-27 11:31:25.100';'2022-10-27 11:56:01.200';'2022-10-27 12:19:25.300';'2022-10-27 12:25:12.500';'2022-10-27 12:30:16.900';'2022-10-27 12:49:19.100';'2022-10-27 12:55:42.700';'2022-10-27 12:58:18.100';'2022-10-27 13:16:06.300';'2022-10-27 14:00:44.700';'2022-10-27 14:04:14.200';'2022-10-27 14:11:54.600';'2022-10-27 14:17:10.000';'2022-10-27 14:42:03.300';'2022-10-27 14:45:22.100';'2022-10-27 15:02:07.900';'2022-10-27 15:25:39.600';'2022-10-27 15:44:58.300';'2022-10-27 15:48:35.800';'2022-10-27 15:54:23.100';'2022-10-27 16:00:17.300';'2022-10-27 16:19:23.700';'2022-10-27 16:27:30.800';'2022-10-27 16:31:30.600';'2022-10-27 16:52:09.700';'2022-10-27 17:16:08.800';'2022-10-27 18:01:39.400';'2022-10-27 18:08:24.800';'2022-10-27 18:17:44.500';'2022-10-27 18:25:02.500';'2022-10-27 18:27:45.700';'2022-10-27 18:48:32.000';'2022-10-27 19:02:01.700';'2022-10-27 19:24:08.300';'2022-10-27 19:30:01.400';'2022-10-27 19:43:02.200';'2022-10-27 19:48:50.300';'2022-10-27 19:55:41.900';'2022-10-27 19:58:23.700';'2022-10-27 20:17:15.800';'2022-10-27 20:28:47.600';'2022-10-27 20:49:36.900';'2022-10-27 20:53:02.700';'2022-10-27 21:07:12.100';'2022-10-27 21:09:47.800';'2022-10-27 21:50:42.300';'2022-10-27 22:07:34.100';'2022-10-27 22:09:18.800'])
a = 85×1 datetime array
27-Oct-2022 00:22:50 27-Oct-2022 05:29:45 27-Oct-2022 05:32:19 27-Oct-2022 05:36:44 27-Oct-2022 05:39:26 27-Oct-2022 05:43:18 27-Oct-2022 05:49:17 27-Oct-2022 05:55:27 27-Oct-2022 05:58:07 27-Oct-2022 06:17:13 27-Oct-2022 06:41:28 27-Oct-2022 07:03:06 27-Oct-2022 07:09:49 27-Oct-2022 07:17:39 27-Oct-2022 07:35:09 27-Oct-2022 07:42:33 27-Oct-2022 07:46:50 27-Oct-2022 08:07:02 27-Oct-2022 08:26:29 27-Oct-2022 08:45:03 27-Oct-2022 08:48:09 27-Oct-2022 08:53:57 27-Oct-2022 08:59:43 27-Oct-2022 09:13:24 27-Oct-2022 09:19:25 27-Oct-2022 09:26:35 27-Oct-2022 09:29:54 27-Oct-2022 09:46:45 27-Oct-2022 10:10:59 27-Oct-2022 10:29:04
timeit(@()sort(a))
ans = 1.3426e-05
timeit(@()sort(datenum(a)))
ans = 4.8560e-06
Lets try some other numeric formats, which might be more accurate over the dates in question:
timeit(@()sort(juliandate(a)))
ans = 6.9260e-06
timeit(@()sort(exceltime(a)))
ans = 4.4260e-06
timeit(@()sort(posixtime(a)))
ans = 3.2285e-06
  3 comentarios
Bruno Luong
Bruno Luong el 18 de Nov. de 2022
Editada: Bruno Luong el 18 de Nov. de 2022
idx2 should be the same, but I think timeit() results without second output could not be representative as the algorithm might skip some branching.
The relative result also might change with the size of your data.
Sim
Sim el 18 de Nov. de 2022
Editada: Sim el 18 de Nov. de 2022
Thanks a lot @Stephen23 and @Bruno Luong !!!
Just for curiosity, I did a test with 1000 iterations on my machine. Except an initial spike in the calculation time (I do not know why), it looks like that
sort(posixtime(a))
Unrecognized function or variable 'a'.
is the fastest solution to sort datetime arrays.
Please see here below the test:
for i = 1 : 1000
a = datetime([
'2022-10-27 00:22:50.000'
'2022-10-27 05:29:45.000'
'2022-10-27 05:32:19.400'
'2022-10-27 05:36:44.100'
'2022-10-27 05:39:26.600'
'2022-10-27 05:43:18.200'
'2022-10-27 05:49:17.400'
'2022-10-27 05:55:27.300'
'2022-10-27 05:58:07.000'
'2022-10-27 06:17:13.800'
'2022-10-27 06:41:28.700'
'2022-10-27 07:03:06.000'
'2022-10-27 07:09:49.800'
'2022-10-27 07:17:39.700'
'2022-10-27 07:35:09.000'
'2022-10-27 07:42:33.600'
'2022-10-27 07:46:50.500'
'2022-10-27 08:07:02.700'
'2022-10-27 08:26:29.600'
'2022-10-27 08:45:03.500'
'2022-10-27 08:48:09.700'
'2022-10-27 08:53:57.000'
'2022-10-27 08:59:43.400'
'2022-10-27 09:13:24.100'
'2022-10-27 09:19:25.000'
'2022-10-27 09:26:35.000'
'2022-10-27 09:29:54.000'
'2022-10-27 09:46:45.700'
'2022-10-27 10:10:59.900'
'2022-10-27 10:29:04.600'
'2022-10-27 10:31:50.100'
'2022-10-27 10:37:45.300'
'2022-10-27 10:42:35.800'
'2022-10-27 10:58:42.300'
'2022-10-27 11:03:48.900'
'2022-10-27 11:10:44.700'
'2022-10-27 11:13:11.100'
'2022-10-27 11:31:25.100'
'2022-10-27 11:56:01.200'
'2022-10-27 12:19:25.300'
'2022-10-27 12:25:12.500'
'2022-10-27 12:30:16.900'
'2022-10-27 12:49:19.100'
'2022-10-27 12:55:42.700'
'2022-10-27 12:58:18.100'
'2022-10-27 13:16:06.300'
'2022-10-27 14:00:44.700'
'2022-10-27 14:04:14.200'
'2022-10-27 14:11:54.600'
'2022-10-27 14:17:10.000'
'2022-10-27 14:42:03.300'
'2022-10-27 14:45:22.100'
'2022-10-27 15:02:07.900'
'2022-10-27 15:25:39.600'
'2022-10-27 15:44:58.300'
'2022-10-27 15:48:35.800'
'2022-10-27 15:54:23.100'
'2022-10-27 16:00:17.300'
'2022-10-27 16:19:23.700'
'2022-10-27 16:27:30.800'
'2022-10-27 16:31:30.600'
'2022-10-27 16:52:09.700'
'2022-10-27 17:16:08.800'
'2022-10-27 18:01:39.400'
'2022-10-27 18:08:24.800'
'2022-10-27 18:17:44.500'
'2022-10-27 18:25:02.500'
'2022-10-27 18:27:45.700'
'2022-10-27 18:48:32.000'
'2022-10-27 19:02:01.700'
'2022-10-27 19:24:08.300'
'2022-10-27 19:30:01.400'
'2022-10-27 19:43:02.200'
'2022-10-27 19:48:50.300'
'2022-10-27 19:55:41.900'
'2022-10-27 19:58:23.700'
'2022-10-27 20:17:15.800'
'2022-10-27 20:28:47.600'
'2022-10-27 20:49:36.900'
'2022-10-27 20:53:02.700'
'2022-10-27 21:07:12.100'
'2022-10-27 21:09:47.800'
'2022-10-27 21:50:42.300'
'2022-10-27 22:07:34.100'
'2022-10-27 22:09:18.800']);
t0 = tic;
[~, idx2] = sort( a, 1, 'ascend');
t0_(i) = toc(t0);
t1 = tic;
[~, idx2] = sort( juliandate(a), 1, 'ascend');
t1_(i) = toc(t1);
t2 = tic;
[~, idx2] = sort( datenum(a), 1, 'ascend');
t2_(i) = toc(t2);
t3 = tic;
[~, idx2] = sort( exceltime(a), 1, 'ascend');
t3_(i) = toc(t3);
t4 = tic;
[~, idx2] = sort( posixtime(a), 1, 'ascend');
t4_(i) = toc(t4);
end
hold on
plot(t0_,'linewidth',2,'DisplayName','simple')
plot(t1_,'linewidth',2,'DisplayName','juliandate')
plot(t2_,'linewidth',2,'DisplayName','datenum')
plot(t3_,'linewidth',2,'DisplayName','exceltime')
plot(t4_,'linewidth',2,'DisplayName','posixtime')
hold off
legend
and a zoom of the same figure:

Iniciar sesión para comentar.

Más respuestas (2)

Peter Perkins
Peter Perkins el 19 de Nov. de 2022
It seems like the obvious response to
Any way faster than this one ?
[~, idx2] = sort(datenum(a), 1, 'ascend');
is to not pass in parameters that are the defaults. That cuts the time by not quite a factor of two for me. But maybe in your real code you need to pass in non-defaults.
Three things:
1) On my machine, the difference between these two small sorts
[~, idx2] = sort(a, 1, 'ascend');
[~, idx2] = sort(datenum(a), 1, 'ascend');
is literally a few microseconds. The usual response is, "Sure, but I'm doing this in a loop, so a few microseconds adds up". It would help to hear the details of that context.
2) Sure, for small sorts like your example there is a difference. But by the time you have 10000 elements
a = datetime(2022,1,1,0,0,randi(10000,10000,1));
the difference between the two is gone. And the total time is less than .5ms.
3) ALL of those conversions through away precision. In your example, you have resolution of at most .1s, so precision is probably not an issue. But in general, converting datetime to those numeric formats is A BAD IDEA.
  2 comentarios
Bruno Luong
Bruno Luong el 22 de Nov. de 2022
"But in general, converting datetime to those numeric formats is A BAD IDEA."
That's true.
For decades a resolution of 53-bit datenum is the only option, whereas in C 128-bit resolution is used for time enumeration on other languages.
Sim
Sim el 30 de Nov. de 2022
Thank you very much @Peter Perkins and @Bruno Luong for your interesting comments!!
(and very sorry for this long delay in replying..)

Iniciar sesión para comentar.


Bruno Luong
Bruno Luong el 18 de Nov. de 2022
Editada: Bruno Luong el 18 de Nov. de 2022
The first peak is due to JIT that is not kicks in.
Here is the result on my PC (code attached)
I have to chase what create the spikes every 33 iterations.
  1 comentario
Sim
Sim el 18 de Nov. de 2022
oh, cool result, it is way much smoother than mine! Many thanks for the code as well! Very interesting :-) (I have used a MacBook Pro with 2.6 GHz 6-Core Intel Core i7 and 16 GB 2400 MHz DDR4)

Iniciar sesión para comentar.

Categorías

Más información sobre Dates and Time en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by