Find the filename with the biggest number?

6 visualizaciones (últimos 30 días)
Jacqueline
Jacqueline el 24 de Jul. de 2013
Hi, so I have a folder with a bunch of files that come in each day. They look like this...
20130721_SPLBRENT3_140554.mat
20130721_SPLBRENT3_160554.mat
20130721_SPLBRENT3_180554.mat
20130722_SPLBRENT3_075651.mat
20130722_SPLBRENT3_095651.mat
20130723_SPLBRENT3_075949.mat
20130723_SPLBRENT3_102025.mat
So, for example, 20130722_SPLBRENT3_095651.mat is from 7/22/2013 and the data in the file was gathered at 9:56am. I am trying to write a code that finds the latest data (20130723_SPLBRENT3_102025.mat), NOT the last file uploaded (because all the files are uploaded at once and one from the 21st may come in before one from the 23rd). How do I search for the file with the latest date and time in the file name?
  3 comentarios
Jacqueline
Jacqueline el 24 de Jul. de 2013
The latest date and time overall, out of all the files.
Jan
Jan el 24 de Jul. de 2013
+1: This is a nice example to demonstrate different techniques to improve code. Sorry, Jacqueline, I know that this was not your intention. But at least a fast, faster and fastest solution is still a solution :-)

Iniciar sesión para comentar.

Respuestas (3)

Jan
Jan el 24 de Jul. de 2013
Editada: Jan el 24 de Jul. de 2013
Some simplifications to Azzi's code:
s = {'20130721_SPLBRENT3_140554.mat'; ...
'20130721_SPLBRENT3_160554.mat'; ...
'20130721_SPLBRENT3_180554.mat'; ...
'20130722_SPLBRENT3_075651.mat'; ...
'20130722_SPLBRENT3_095651.mat'; ...
'20130723_SPLBRENT3_075949.mat'; ...
'20130723_SPLBRENT3_102025.mat'}
a = regexp(s, '_|\.', 'split');
b = cat(1, a{:});
date = datenum(b(:,1), 'yyyymmdd') + datenum(b(:,3), 'HHMMSS');
[max_date,idx] = max(date);
latest_file = s{idx};
When a function operates on cells directly like REGEXP and DATENUM, CELLFUN especially when combined with anonymous functions is much slower. When s contains 10'000 distinct strings, omitting CELLFUN reduces the runtime from 6.7 seconds to 0.16 seconds (R2009a/64/Win7). In addition the leaner code is less prone to typos and easier to understand and debug.
Of course the runtime does not matter here most likely, because the number of files might be small. But it could be useful for other problems, when equivalent solutions are applied.
Btw., this is reduces the runtime by further 50%:
c = CStrCatStr(b(:, 1), 'T', b(:, 3));
date = DateStr2Num(c, 30);
See FEX: CStrCatStr and FEX: DateStr2Num. But be aware, that downloading and compiling would need much more time that you ever could win for such small problems. But it can be useful when working with millions of files or with 1000 files in real-time.
And the last thought about efficient programs: I've shown different methods to perform the same operations faster. But exploiting, that the chronological order equals the alphabetical order is again 4 times faster than the C-Mex monsters. The recognition of such useful patterns in the data is usually much more important than multi-cores, Gigas (Hz or Bytes) or sophisticated vectorizations. Then the person, who decided to use these nice names solved the problem most efficiently already.
  3 comentarios
Cedric
Cedric el 25 de Jul. de 2013
Editada: Cedric el 25 de Jul. de 2013
It splits file names using either '_' or '.' as a separator:
>> s = regexp('20130722_SPLBRENT3_075651.mat', '_|\.', 'split')
s =
'20130722' 'SPLBRENT3' '075651' 'mat'
The pipe | means "or", and the . has to be backslash-ed because it has a special meaning in regular expressions ( '\.' codes the dot character, and '.' is a wildcard for any character).
Jacqueline
Jacqueline el 25 de Jul. de 2013
Oh okay. Thank you very much!

Iniciar sesión para comentar.


Azzi Abdelmalek
Azzi Abdelmalek el 24 de Jul. de 2013
Editada: Azzi Abdelmalek el 24 de Jul. de 2013
s={'20130721_SPLBRENT3_140554.mat'
'20130721_SPLBRENT3_160554.mat'
'20130721_SPLBRENT3_180554.mat'
'20130722_SPLBRENT3_075651.mat'
'20130722_SPLBRENT3_095651.mat'
'20130723_SPLBRENT3_075949.mat'
'20130723_SPLBRENT3_102025.mat'}
a=cellfun(@(x) regexp(x,'_|\.','split'),s,'un',0)
date=cell2mat(cellfun(@(x) datenum([x{1} ' ' x{3}],'yyyymmdd HHMMSS'),a,'un',0))
[max_date,idx]=max(date)
latest_file=s{idx} % The latest file
latest_date=datestr(max_date,'dd-mm-yyyy HH:MM:SS')
  1 comentario
Jan
Jan el 24 de Jul. de 2013
Editada: Jan el 24 de Jul. de 2013
See my 2nd answer.

Iniciar sesión para comentar.


Jan
Jan el 24 de Jul. de 2013
Congratulations! If the format of the names is "20130723_SPLBRENT3_102025", the alphabetical order equals the temporal order. Then this is sufficient:
list = dir(fullfile(FolderName, '*.mat'));
name = {list.name};
sorted = sort(name);
latest = sorted{length(sorted)};
In all cases I have seen yet, the reply of dir is alphabetically sorted already. But as long as this is not documented, I'd rely on an explicit sorting.

Categorías

Más información sobre Characters and Strings en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by