How to read a comma delimited .dat file containing mixed formats

Question

Camille Couzi el 2 de Oct. de 2018

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/421718-how-to-read-a-comma-delimited-dat-file-containing-mixed-formats

Comentada: dpb el 3 de Oct. de 2018

Hi, I have been struggling with this for hours: I have a .dat file containing comma separated values. this is how the first lines look like:

Cell 1,13,3-22-2018 05:30,-0.02496409
Cell 1,14,3-22-2018 05:35,-0.02496409
Cell 1,15,3-22-2018 05:40,-0.02496409
Cell 1,16,3-22-2018 05:45,-0.02496409
Cell 1,17,3-22-2018 05:50,-0.02496409

My file has more than 1,100,000 lines so I know I have to use textscan. I have tried textread,dlmread,importdata,readtable... textscan is my best bet, but I can't figure out the format.

I would like to end up with

CellNb = vector containing all my cell numbers (in this example it'll be 1 1 1 1 1)
EnsNb = vector containing all my ensemble numbers (in this example it'll be 13 14 15 16 17)
Date = vector containing my dates + times (datenum format)
Val = containing the values on the last column.

So far I am trying:

filename='SimplifiedFile_commas.dat';
fid = fopen(filename);
C=textscan(fid,'%s%d%s%f64\n','delimiter',',');
fclose(fid);
celldisp(C)
which clearly doesn't work because i get:
C{1}{1} =
ÿþC e l l   1 
C{2} =
       []
C{3} =
       {}
C{4} =
       []

Could anyone give me a hint on the format I have to use? Thanks!!!

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

dpb el 2 de Oct. de 2018

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/421718-how-to-read-a-comma-delimited-dat-file-containing-mixed-formats#answer_339313

Editada: dpb el 3 de Oct. de 2018

Just read the file as it's given then parse the data columns...

t=readtable('couzi.csv','ReadVariableNames',0,'Delimiter',',');  % read base file
t.Var1=str2num(char(strrep(t.Var1,'Cell ','')));                 % parse cell number
t.Var3=datetime(t.Var3,'Format','M-dd-yyyy HH:mm');              % convert to datetime
t.Properties.VariableNames={'Cell','Ens','Date','Val'}           % set desired variable names
t =
5×4 table
  Cell    Ens         Date             Val   
  ____    ___    _______________    _________
  1       13     3-22-2018 05:30    -0.024964
  1       14     3-22-2018 05:35    -0.024964
  1       15     3-22-2018 05:40    -0.024964
  1       16     3-22-2018 05:45    -0.024964
  1       17     3-22-2018 05:50    -0.024964
>>

ADDENDUM

As pedagogical note; a little more obtuse but perhaps "more MATLABy" solution to the first column could be

t.Var1=cellfun(@(s) sscanf(s,'Cell%f'),t.Var1);

to read and convert the actual cell content rather than strip/replace. Not sure if there would be any noticeable performance difference for large files or not.

Which leads to the alternate original solution to parse the file directly-- textscan can be used but readtable is more flexible in many ways altho for a very large file the resulting table may be somewhat slow in performance.

fmt='Cell%f%f%q%f';
c=textscan(fid,fmt,'delimiter',',','collectoutput',0);

will return

c =
1×4 cell array
  {5×1 double}    {5×1 double}    {5×1 cell}    {5×1 double}

where the time string is a cell array to convert to datetime.

Unfortunately, the '%{}D' time format scanning routine is broken for embedded space in the string between the date and time fields as is so common; the obvious

>> textscan(c{3},'%{M-d-yyyy HH:mm}D')
Error using textscan
Unable to read the DATETIME data with the format 'M-d-yyyy HH:mm'. If the data is not a time, use %q to get
text data.

fails miserably. One can workaround it with

>> t=textscan(strrep(c{3}(1),' ',''),'%{M-d-yyyyHH:mm}D')
t =
1×1 cell array
  {[3-22-201805:30]}

and end up with an ugly, illegible output format; this, too, is fixable:

>> t.Format='default'
t = 
datetime
 22-Mar-2018 05:30:00

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Walter Roberson el 3 de Oct. de 2018

See https://www.mathworks.com/matlabcentral/answers/421358-retrieve-data-containing-date-and-time-from-text-file#comment_615974 for more on when %D accepts spaces.

dpb el 3 de Oct. de 2018

AH! I had not thought of there being the interaction with 'whitespace', Walter.

The textscan doc says when sees %{}D format the field is scanned as '%q' but of course the given string is malformed for embedded blanks by not being surrounded by ".

I suppose it would be too much to ask for but I think (and believe most users would inherently expect that) textscan/%D should be smart enough to read the provided format string as given and parse the corresponding field as told, only reporting error or failing when the string doesn't actually match the data or the format string itself is malformed, embedded delimiter or no.

But, thanks for pointing out the workaround; I'll try to recall it next time it comes up... :)

PS. I don't see the ambiguity you point out in the last example there; if 'whitespace','' is given in conjunction with any format string conversion specifier, textscan does not add the superfluous blank it does with other delimiters. That example seems clear to me it should work unless I'm missing something.

Iniciar sesión para comentar.

How to read a comma delimited .dat file containing mixed formats

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

How to read a comma delimited .dat file containing mixed formats

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

4 comentarios Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos