Split cvs on commas but prevent doing so for a string with a comma in it

Question

Tycho Maas el 13 de Dic. de 2020

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/692640-split-cvs-on-commas-but-prevent-doing-so-for-a-string-with-a-comma-in-it

Comentada: Cris LaPierre el 14 de Dic. de 2020

Abrir en MATLAB Online

My Excel csv file looks like this:

Data,test,04-12-2020 13:11,0,"8,2",1,2,3

Currently I use the following code to seperate the columns:

[~,~,dataCGM] = xlsread('file.csv');
outCGM = regexp(dataCGM, ',', 'split');
outCGM = outCGM(2:end-1); 

This does split the columns on commas but also does so for the string "8,2" which is not what I want. Does anyone know how to prevent this issue and keep the value as a string in a single column?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Cris LaPierre el 13 de Dic. de 2020

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/692640-split-cvs-on-commas-but-prevent-doing-so-for-a-string-with-a-comma-in-it#answer_574180

Perhaps one of the options given here is helpful.

20 comentarios
Mostrar 18 comentarios más antiguosOcultar 18 comentarios más antiguos

Cris LaPierre el 13 de Dic. de 2020

Abrir en MATLAB Online

File_MATLAB_test.csv

There is something wrong with your csv file format. I've recreated it.

I see you told Walter you can't predefine anything about data types. That's a shame, as that's is a strength of MATLAB. With that in mind, here's a minimalist approach that recognizes every field (every comma) w/o making any assumptions about the data it contains.

opts = detectImportOptions('File_MATLAB_test.csv');
opts.TrailingDelimitersRule = 'keep';
opts.ConsecutiveDelimitersRule = 'split';
data = readtable('File_MATLAB_test.csv',opts,"ReadVariableNames",false)
data = 2x19 table
      Var1        Var2              Var3            Var4     Var5      ExtraVar1     ExtraVar2     ExtraVar3     ExtraVar4     ExtraVar5     ExtraVar6     ExtraVar7     ExtraVar8     ExtraVar9     ExtraVar10    ExtraVar11    ExtraVar12    ExtraVar13    ExtraVar14
    ________    ________    ____________________    ____    _______    __________    __________    __________    __________    __________    __________    __________    __________    __________    __________    __________    __________    __________    __________

    {'Data'}    {'test'}    {'04-12-2020 12:24'}     0      {'8,2'}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}
    {'Data'}    {'test'}    {'04-12-2020 12:41'}     0      {'5,9'}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}

Cris LaPierre el 14 de Dic. de 2020

Editada: Cris LaPierre el 14 de Dic. de 2020

Abrir en MATLAB Online

Test_file.csv

You probably won't believe how much trouble a little quote can cause. I don't love it, but this works.

Maybe someone like @Stephen Cobeldick, who is a regexp ninja, can improve on this.

fid = fopen("Test_file.csv");
% capture variable names
str=fgetl(fid);
varnames = textscan(str,'%s','Delimiter',',');
% capture the remaining file contents (assumed to be uniform)
raw = textscan(fid,'%q','Delimiter',',');
fclose(fid);
% Split the raw data by delimiter, keeping quoted text together
ss = @(C) strsplit(C,'(?!\<"[^"]*),(?![^"]*"\>)','CollapseDelimiters',0,'DelimiterType','RegularExpression');
M=cellfun(ss,raw{1},'UniformOutput',false);
% convert to a table
T = cell2table(M);
% Make each column its own variable. Name columns using variable names from file.
T=splitvars(T,1,"NewVariableNames",varnames{1})
T = 2x19 table
           Apparaat            Serienummer    Tijdstempel apparaat    Gegevenstype    Historische glucose mmol/l    Scan Glucose mmol/l    Niet-numerieke snelwerkende insuline    Snelwerkende insuline (eenheden)    Niet-numeriek voedsel    Koolhydraten (gram)    Koolhydraten (porties)    Niet-numerieke langwerkende insuline    Langwerkende insuline (eenheden)     Notities     Strip Glucose mmol/l    Keton mmol/l    Maaltijdinsuline (eenheden)    Correctie insuline (eenheden)    Wijzigen insuline gebruiker (eenheden)
    _______________________    ___________    ____________________    ____________    __________________________    ___________________    ____________________________________    ________________________________    _____________________    ___________________    ______________________    ____________________________________    ________________________________    __________    ____________________    ____________    ___________________________    _____________________________    ______________________________________

    {'FreeStyle LibreLink'}     {'data'}      {'04-12-2020 12:24'}       {'0'}                {'"5,3"'}                 {0×0 char}                      {0×0 char}                            {0×0 char}                    {0×0 char}              {0×0 char}               {0×0 char}                       {0×0 char}                            {0×0 char}               {0×0 char}         {0×0 char}          {0×0 char}             {0×0 char}                      {0×0 char}                            {0×0 char}              
    {'FreeStyle LibreLink'}     {'data'}      {'04-12-2020 12:41'}       {'0'}                {'"4,9"'}                 {0×0 char}                      {0×0 char}                            {0×0 char}                    {0×0 char}              {0×0 char}               {0×0 char}                       {0×0 char}                            {0×0 char}               {0×0 char}         {0×0 char}          {0×0 char}             {0×0 char}                      {0×0 char}                            {0×0 char}              

Stephen23 el 14 de Dic. de 2020

Editada: Stephen23 el 14 de Dic. de 2020

"Maybe someone like @Stephen Cobeldick, who is a regexp ninja, can improve on this."

Thank you for the unique commendation.

Although it is probably not the fastest approach, I would try importing the entire file as one string, apply some string manipulation to it to remove the line-end quotation marks (e.g. REGEXPREP), and then write a new file which can then be directly imported using READTABLE. That has the benefit of importing all the different data classes correctly without much overhead and all of the standard READTABLE options.

It is not trivial because of course valid quotes around a string should not be removed.

This issue pops up enough to indicate that it would be nice for it to be handled natively:

https://www.mathworks.com/matlabcentral/answers/398852-csvread-has-problems-reading-all-columns

https://www.mathworks.com/matlabcentral/answers/510186-read-csv-with-row-in-quotes-string-and-numbers

Perhaps it would be a useful addition for READTABLE et al to include an option named e.g. LINEQUOTE which can be set to the required character (by default empty).

Cris LaPierre el 14 de Dic. de 2020

I can only make it work for what I see.

You can look into what settings are available from detectImportOptions. I suspect the NumHeaderLines is what you are looking for.

Iniciar sesión para comentar.

Answer 2

Walter Roberson el 13 de Dic. de 2020

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/692640-split-cvs-on-commas-but-prevent-doing-so-for-a-string-with-a-comma-in-it#answer_574235

Abrir en MATLAB Online

readtable() with a format that is

'%s,%s,%{dd-MM-uuuu HH:mm}D,%f,%q,%f,%f,%f'

2 comentarios
Mostrar NingunoOcultar Ninguno

Tycho Maas el 13 de Dic. de 2020

Thanks but the code needs to work on itself without predefining what will be in which column.

Image Analyst el 13 de Dic. de 2020

That makes no sense. A program will not "work on itself". You need to tell your code HOW to process the file. It won't magically figure it out. Attach your csv file if you need more help.

Iniciar sesión para comentar.

Split cvs on commas but prevent doing so for a string with a comma in it

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (2)

20 comentarios
Mostrar 18 comentarios más antiguosOcultar 18 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Split cvs on commas but prevent doing so for a string with a comma in it

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (2)

20 comentarios Mostrar 18 comentarios más antiguosOcultar 18 comentarios más antiguos

2 comentarios Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

20 comentarios
Mostrar 18 comentarios más antiguosOcultar 18 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno