Read file with non-uniform lines?

Question

0 votos

Hi. I'm a Matlab newbie. I would like to read in a file where the lines have different formats, as below.

% Coordinates
%   Code    ID      X         Y
    C       101     0.001     0.001
    C       102     1.002     0.002
    C       103     1.003     1.003
    C       104     0.004     1.004
% Distances
%   Code    ID      From      To      Dist
    D       201     101       103     1.417
    D       202     102       104     1.414

If the first character is C, use...

A = textscan(fid,'%c %d %f %f')

If the first character is D, use...

A = textscan(fid,'%c %d %d %d %f')

After, I'd like to assign the data to structs (c.id, c.x, c.y, d.id, d.from, d.to, d.dist), but first I think I just need to get it scanned in. Is it possible to apply some logic to reading the file? Thank you.

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Walter Roberson el 26 de Oct. de 2020

Abrir en MATLAB Online

'^\s*C.*$', 'dotexceptnewline', 'lineachors'

or

'(?<=(^|\n))\s*C[^\n]*'

with no additional options needed

bene1 el 26 de Oct. de 2020

Abrir en MATLAB Online

Great, thanks again. Now have...

C =
  4×1 cell array
    {'    C       101     0.001     0.001←'}
    {'    C       102     1.002     0.002←'}
    {'    C       103     1.003     1.003←'}
    {'    C       104     0.004     1.004←'}

With C as a 4x1, I believe my next step is to extract out the columns. My first thought was

A = textscan(C,'%c %d %f %f')

but I see I can't do that. Looking into cell2struct?

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Walter Roberson el 26 de Oct. de 2020

Abrir en MATLAB Online

0 votos

Named tokens, I said. Do not extract the lines ahead of time.

FileText = fileread(YourFileName);
Ctokens = regexp(FileText, '^\s*C\s+(?<ID>\d+)\s+(?<X>\S+)\s+(?<Y>\S+)', 'names', 'lineanchors');
%Ctokens will now be a struct array with field names ID, X, and Y, each of which are character vectors.
C.ID = str2double({Ctokens.ID});
C.X = str2double({Ctokens.X});
C.Y = str2double({Ctokens.Y});
Dtokens = regexp(FileText, '^\s*D\s+(?<ID>\d+)\s+(?<From>\d+)\s+(?<To>\d+)\s+(?<Dist>\S+)', 'names', 'lineanchors');
%Dtokens will now be a struct array with field names ID, From, To, Dist, each of which are character vectors.
D.ID = str2double({Dtokens.ID});
D.From = str2double({Dtokens.From});
D.To = str2double({Dtokens.To});
D.Dist = str2double({Dtokens.Dist});

Amount of processing work is pretty minimial. Pretty much all of the effort is in figuring out the proper regexp patterns to use (which can be pretty tricky when there are variant lines.)

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

bene1 el 27 de Oct. de 2020

Cool, thank you kindly!

Iniciar sesión para comentar.

Answer 2

per isakson el 26 de Oct. de 2020

Abrir en MATLAB Online

0 votos

>> S = cssm( 'd:\m\cssm\cssm.txt' )
S = 
  1×2 struct array with fields:
    header
    colhead
    Code
    data
>> S(1)
ans = 
  struct with fields:
     header: "Coordinates"
    colhead: ["Code"    "ID"    "X"    "Y"]
       Code: [4×1 string]
       data: [4×3 double]
>> S(2)
ans = 
  struct with fields:
     header: "Distances"
    colhead: ["Code"    "ID"    "From"    "To"    "Dist"]
       Code: [2×1 string]
       data: [2×4 double]

where

function    sas = cssm( ffs )
    
    chr = fileread( ffs );
    str = string( chr );
    str = replace( str, char([13,10]), newline );   % get rid of the carriage return
   
    % split the string into blocks. Use the block header as delimiter. 
    [blk,del] = strsplit( str, '(?m)^\x20*%\x20\w+\x20*\n'  ...      
                        , 'DelimiterType','RegularExpression' );
                    
    blk(1) = [];  % remove empty block before the first delimiter                    
    
    len = numel( del );
    sas(1,len) = struct( 'header',"", 'colhead',"", 'Code',"", 'data',nan );
    
    for jj = 1 : len    % loop over all blocks
        
        sas(jj).header = regexp( del(jj), '\w+', 'match','once' );  % match the name
        
        cac = textscan( blk(jj), "%[^\n]", 1 ); % read the first row
        tmp = strsplit( string(cac{1}) );       % split the row into column headers
        tmp(1) = [];                            % remove the comment character, "%"
        sas(jj).colhead = tmp;
        
        cac = textscan( blk(jj), ['%s',repmat('%f',1,numel(tmp)-1)] ...
                    ,   'Headerlines',1, 'CollectOutput',true );
        sas(jj).Code = string(cac{1});
        sas(jj).data = cac{2};
    end
end

and where cssm.txt contains the data given in of your question.

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

bene1 el 27 de Oct. de 2020

Thank you for the idea. :-)

Iniciar sesión para comentar.

Read file with non-uniform lines?

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Más respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Productos

Versión

Etiquetas

Community Treasure Hunt

Read file with non-uniform lines?

5 comentarios Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Más respuestas (1)

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Productos

Versión

Etiquetas

Ver también

Community Treasure Hunt

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos