Borrar filtros
Borrar filtros

Why is str2num not recommended when it is faster in certain circumstances?

50 visualizaciones (últimos 30 días)
Darcy Cordell
Darcy Cordell el 4 de Jul. de 2024 a las 22:21
Editada: Stephen23 el 5 de Jul. de 2024 a las 9:13
I have a cell array of millions of strings representing dates with the format "yyyyMMddHHmmss"
I need to convert these to datetimes. After many attempts of various kinds I think I have found an optimal solution.
However, my solution requires that I use str2num instead of str2double and results in nearly 100x increase in speed. This is depite the fact that MATLAB recommends using str2double for "faster performance" and specifically discourages str2num.
In particular, str2double cannot convert a char array and results in Inf, while str2num converts the char array without issue.
Below is an example script.
(P.S. not directly related to this question, but if there is a faster way to convert cell arrays of strings to datetimes, let me know!)
%Make example input data
t1 = datetime(2000,1,1,0,0,0);
t2 = datetime("now");
t = string(datestr(t1:days(1):t2,'yyyymmddHHMMss'));
t_cell = cellstr(t); %<--- This is the "cell array" example data
%Option 1: datetime cell array of strings
% ---> Very slow (~0.75 s)
tic
tcheck1 = datetime(t_cell,'InputFormat','yyyyMMddHHmmss');
toc
%Option 2: str2double
% ---> DOES NOT WORK (results in Inf)
tic
da = char(t_cell);
year1 = str2double(da(:,1:4));
month1 = str2double(da(:,5:6));
day1 = str2double(da(:,7:8));
hour1 = str2double(da(:,9:10));
min1 = str2double(da(:,11:12));
sec1 = str2double(da(:,13:14));
tcheck2 = datetime(year1,month1,day1,hour1,min1,sec1);
toc
%Option 3: str2double with extra conversion from char to string
% ---> About twice as fast as Option #1 (~0.4 s)
tic
da = char(t_cell);
year1 = str2double(string(da(:,1:4)));
month1 = str2double(string(da(:,5:6)));
day1 = str2double(string(da(:,7:8)));
hour1 = str2double(string(da(:,9:10)));
min1 = str2double(string(da(:,11:12)));
sec1 = str2double(string(da(:,13:14)));
tcheck3 = datetime(year1,month1,day1,hour1,min1,sec1);
toc
%Option 3: str2num
% ---> About 100 times faster than Option #1 and #3 (~0.005 s)
tic
da = char(t_cell);
year1 = str2num(da(:,1:4));
month1 = str2num(da(:,5:6));
day1 = str2num(da(:,7:8));
hour1 = str2num(da(:,9:10));
min1 = str2num(da(:,11:12));
sec1 = str2num(da(:,13:14));
tcheck4 = datetime(year1,month1,day1,hour1,min1,sec1);
toc

Respuesta aceptada

Stephen23
Stephen23 el 5 de Jul. de 2024 a las 4:18
Editada: Stephen23 el 5 de Jul. de 2024 a las 9:13
"Why is str2num not recommended when it is faster in certain circumstances?"
Because it relies on EVAL. Code called by EVAL is not accelerated by the JIT engine, which is why that you get that warning (this slow-down effect is particularly noticeable with code which repeats a lot (e.g. loops), it might not be as noticeable with code which does not repeat a lot). The reliance on EVAL also make the code liable to unxpected behavior when provided with input data containing valid commands/function calls (the STR2NUM documentation explicitly warns for this), and makes the code less versatile because EVAL is not supported by e.g. parallel loops or the code compiler. For these reasons experienced users often prefer to avoid STR2NUM.
"In particular, str2double cannot convert a char array and results in Inf..."
STR2DOUBLE is not documented to work on character matrices. It is documented that it its input "str can be a character vector, a cell array of character vectors, or a string array."
If speed is a high priority for you, it might help to leverage low-level conversion functions:
%Make example input data
t1 = datetime(2000,1,1,0,0,0);
t2 = datetime("now");
%This is the "cell array" example data
t_cell = cellstr(string(datestr(t1:days(1):t2,'yyyymmddHHMMss')));
%Option 1: datetime cell array of strings
% ---> Very slow (~0.75 s)
tic
tcheck1 = datetime(t_cell,'InputFormat','yyyyMMddHHmmss');
toc
Elapsed time is 0.546119 seconds.
%Option 2: str2double
% N/A
%Option 3: str2double with extra conversion from char to string
% ---> About twice as fast as Option #1 (~0.4 s)
tic
da = char(t_cell);
year1 = str2double(string(da(:,1:4)));
month1 = str2double(string(da(:,5:6)));
day1 = str2double(string(da(:,7:8)));
hour1 = str2double(string(da(:,9:10)));
min1 = str2double(string(da(:,11:12)));
sec1 = str2double(string(da(:,13:14)));
tcheck3 = datetime(year1,month1,day1,hour1,min1,sec1);
toc
Elapsed time is 0.243270 seconds.
%Option 4: str2num
% ---> About 100 times faster than Option #1 and #3 (~0.005 s)
tic
da = char(t_cell);
year1 = str2num(da(:,1:4));
month1 = str2num(da(:,5:6));
day1 = str2num(da(:,7:8));
hour1 = str2num(da(:,9:10));
min1 = str2num(da(:,11:12));
sec1 = str2num(da(:,13:14));
tcheck4 = datetime(year1,month1,day1,hour1,min1,sec1);
toc
Elapsed time is 0.043460 seconds.
% Option 5: SSCANF
tic
da = char(t_cell);
M = sscanf(da.','%4d%2d%2d%2d%2d%2d',[6,Inf]).';
tcheck5 = datetime(M);
toc
Elapsed time is 0.013550 seconds.
% Option 6: DOUBLE
tic
da = char(t_cell);
year1 = double(string(da(:,1:4)));
month1 = double(string(da(:,5:6)));
day1 = double(string(da(:,7:8)));
hour1 = double(string(da(:,9:10)));
min1 = double(string(da(:,11:12)));
sec1 = double(string(da(:,13:14)));
tcheck6 = datetime(year1,month1,day1,hour1,min1,sec1);
toc
Elapsed time is 0.013220 seconds.
Mileage will vary depending on your installed MATLAB version and hardware. Chasing down every last millisecond is not always productive, if the code will then run on other machines or releases.
I have not tried it, but you might find this useful:

Más respuestas (1)

Umar
Umar el 4 de Jul. de 2024 a las 23:48
Hi Darcy,
That’s is a very good catch. You asked, Why is str2num not recommended when it is faster in certain circumstances?
My suggestion is that str2num may offer speed advantages in specific scenarios, its drawbacks in terms of error handling, ambiguity, flexibility, readability, and future compatibility outweigh the performance gains.
You also asked, not directly related, if there is a faster way to convert cell arrays of strings to datetimes, let me know!
One efficient approach is to utilize vectorized operations and built-in functions provided by Matlab.Preallocating memory for the datetime array before conversion can improve performance. This can be achieved by initializing an empty datetime array with the desired size before populating it with converted values.For even faster conversion of large datasets, consider leveraging Matlab's Parallel Computing Toolbox. By parallelizing the conversion process, you can distribute the workload across multiple cores or workers, significantly reducing processing time.
Hope this answers your question.

Categorías

Más información sobre Characters and Strings en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by