# Datenum - different date formats

53 views (last 30 days)
Ms. Mat on 23 Dec 2012
Background
I have an excel sheet with different date formats (in excel some are in General format, some in date format totally random)
example: 10/05/2005 , 10/5/2005, 10/05/05, 05/12/2005 and so on.
I read the XL file into MATLAB as follows:
[dta_num , dta_txt , dta_raw] = xlsread(...)
I need to work with the dates in the excel sheet subsequently.
Actual Problem
I tried out the following commands
• datenum('10/5/2005')
• datenum('10/05/2005')
• datenum('10/05/05')
• datenum(dta_txt(2,2))
All of them return the same answer. (732590) But I realized that 732590 is read as 05-Oct-2005 when I ran
datestr(732590)
However the dates in my data sheet needs to be read as dd/mm i.e 10-May-2005.
If I choose to specify the format,
1. * datenum('10/5/2005','dd/mm/yyyy') ans = 732442
2. * datenum('10/05/2005','dd/mm/yyyy') ans = 732442
3. * datenum('10/05/05','dd/mm/yyyy') ans = 1957
Originally I thought I will convert the dates using datenum and the after my manipulations using functions such as month(datenum(dta_txt(2,2),'dd/mm/yyyy')) I will use datestr and I will have all the dates in the same format. But now I am back to square 1 since datestr(1957) gives me a 2 digit year.
Also I want to make sure that date formats are not mixed up i.e dd/mm/yyyy and mm/dd/yyyy are mixed up. For this I plan to go over the date and see if the previous date belongs to the same month or one month earlier and no month is greater than 12 assuming a dd/mm format. If it is, then it is to be decoded as mm/dd/yyyy and appropriately changed to dd/mm/yyyy.
Is the only solution is to go over the length of the dta_txt contents using cellfun('length',dta_txt(:,2)) and change it to 4 year format (I guess there are very few in 2 year digit YY format)
To summarize my requirements are;
1. date format which takes less resource and easy to manipulate as I might want to extract and make comparisons of the the month and year later. 2. Have a consistent date format preferably dd/mm/yyyy.
##### 2 CommentsShowHide 1 older comment
Walter Roberson on 23 Dec 2012
I closed the other (earlier) post as there had been less commentary on it.

Laura Proctor on 23 Dec 2012
Edited: Laura Proctor on 23 Dec 2012
Would something like this help?
% Create a cell array containing dates in various formats
y = { '10/5/05' ; '18/2/2004' ; '3/3/2003' }
% Split up the elements in the cell array
[~,~,~,~,~,~,ss] = regexp(y,'/')
% Determine which dates have a two element year
ivals = cellfun(@(x) numel(x{3})==2,ss)
% Preallocate a vector to contain the dates
dates = zeros(size(y))
% Convert those dates with a four element year using datenum
dates(~ivals) = datenum(y(~ivals),'dd/mm/yyyy')
% Convert the dates with a two element year
dates(ivals) = datenum(y(ivals),'dd/mm/yy')
You could also use datevec to pull out the elements separately. Here I have assumed that any two element years that occur are later than 2000. If you also have dates that occur in the 1900s, then a little more logic is required.
% Create three vectors: year, month, and day
[year,month,day,~,~,~] = datevec(y,'dd/mm/yyyy')
% Logical array containing those year values that are less than 2000
lt2000 = year < 2000
% Add 2000 to any year that are less than 2000
year(lt2000) = 2000+year(lt2000)
Laura Proctor on 23 Dec 2012
You're right, proecsm - it does work if you just have one date. However, if you have a cell array of dates, it will not return the same results.
Walter posted a great solution in the duplicate post:
dates = datenum(y,'dd/mm/yyyy',2000)

per isakson on 23 Dec 2012
Edited: per isakson on 23 Dec 2012
There is no way for me (in Sweden) to know how to interpret
'01/02/03'
or
'01/02/2003'
IMO: when in doubt avoid to use the default values. It is better to specify the format an extra time than not to do it when needed.
IMO: the most precious resource is the time of the programmer.
In this case a vectorized code is faster than a loop. Experiment:
>> mysteries_dates_to_sdn
Elapsed time is 1.540982 seconds.
Elapsed time is 0.056104 seconds.
>> mysteries_dates_to_sdn
Elapsed time is 1.508512 seconds.
Elapsed time is 0.055982 seconds.
>>
where
%%mysteries_dates_to_sdn
N = 1e3;
castr = {
'10/05/2005'
'10/5/2005'
'10/05/05'
'05/12/2005'
};
castr = repmat( castr, [N,1] );
tic
sdn_loop = mysteries_dates_to_sdn_loop( castr );
toc
tic
sdn_vec = mysteries_dates_to_sdn_vectorized( castr );
toc
assert( all( sdn_vec == sdn_loop ) ...
, 'mysteries_dates_to_sdn:failure' ...
, 'Failure: sdn_loop differs from sdn_vec' )
and
function sdn = mysteries_dates_to_sdn_loop( castr )
sdn = nan( size( castr ) );
for ii = 1 : numel( castr )
str = castr{ii};
len = length( str );
if len >= 9
sdn(ii) = datenum( str, 'dd/mm/yyyy' );
else
sdn(ii) = datenum( str, 'dd/mm/yy' );
end
end
end
and
function sdn = mysteries_dates_to_sdn_vectorized( castr )
sdn = nan( size( castr ) );
len = cellfun( @length, castr );
islong = ( len >= 9 );
isshort = not( islong );
sdn(islong) = datenum( castr(islong) , 'dd/mm/yyyy' );
sdn(isshort) = datenum( castr(isshort), 'dd/mm/yy' );
end
.
BTW: Why not use datevec rather than datenum to avoid calling the function, month
.
More:
This function (proposed above) is approx. 7% faster than mysteries_dates_to_sdn_vectorized
function sdn = mysteries_dates_to_sdn_pivot( castr )
sdn = datenum( castr, 'dd/mm/yyyy', 2000 );
end