Extracting numbers from mixed string
    18 visualizaciones (últimos 30 días)
  
       Mostrar comentarios más antiguos
    
I have filenames saved as strings such as '2001_06m'. Sometimes the files are inconsistently named as '2001_6m' (missing the zero before the 6) or '2001_06' (missing the m at the end). What code would I use to extract the non-zero integers after underscore in all cases (i.e. output = 6)?
And separately, what code would I use to extract the numbers before the underscore (usually they are 4 digits long, but sometimes 3 digits, i.e. '001' instead of '2001')?
0 comentarios
Respuesta aceptada
  Jan
      
      
 el 7 de Oct. de 2018
        
      Editada: Jan
      
      
 el 8 de Oct. de 2018
  
      s = '2001_06m';
d = sscanf(s, '%d_%d')
ans =
      2001
         6
Easier and faster than regexp.
[EDITED] If the input is a cell string:
C = {'2001_06m', '002_77q'};
S = sprintf('%s ', C{:});
S(S < '0' | S > '9') = ' ';  % Mask all non-numbers
Num = sscanf(S, '%d %d ', [2, Inf]);
2 comentarios
  Guillaume
      
      
 el 8 de Oct. de 2018
				To be honest, none of the solutions are vectorised. Vectorising strsplit wouldn't be easy either. It wouldn't be too hard to vectorise the regexp solution, but sscanf is certainly more elegant.
Más respuestas (3)
  Guillaume
      
      
 el 7 de Oct. de 2018
        
      Editada: Guillaume
      
      
 el 8 de Oct. de 2018
  
      A possible regexp version would be:
str2double(regexp(filename, '(\d+)_(\d+)', 'tokens', 'once'))
edit: following the discussion in Jan's answer, a vectorised version for when filenames is a cell array of char arrays or a string array:
tokens = regexp(filenames, '(\d+)_(\d+)', 'tokens', 'once');
str2double(vertcat(tokens{:}))
Note that the vertcat call will fail if a filename does not match the pattern.
0 comentarios
  Stephen23
      
      
 el 8 de Oct. de 2018
        
      Editada: Stephen23
      
      
 el 8 de Oct. de 2018
  
      Fully vectorized, one line, and more efficient than regexp and/or str2double:
>> C = {'2001_06m','2001_7m','2001_08'};
>> sscanf(sprintf('%sm',C{:}),'%*d_%d%*[m]') % second number
ans =
   6
   7
   8
>> sscanf(sprintf('%sm',C{:}),'%d_%*d%*[m]') % first number
ans =
   2001
   2001
   2001
>> sscanf(sprintf('%sm',C{:}),'%d_%d%*[m]') % both numbers
ans =
   2001
      6
   2001
      7
   2001
      8
"...usually they are 4 digits long, but sometimes 3 digits, i.e. '001' instead of '2001'"
My answer works regardless of the number of digits in the numbers.
0 comentarios
  dpb
      
      
 el 7 de Oct. de 2018
        v=str2double(strip(splitstr(s,'_'),'m'));      % chuckles...
In reality, write a regexp expression is the more general solution but I have to spend too much time figuring out the syntax and am too impatient... :)
0 comentarios
Ver también
Categorías
				Más información sobre Characters and Strings en Help Center y File Exchange.
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



