You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
Split cvs on commas but prevent doing so for a string with a comma in it
21 views (last 30 days)
Show older comments
My Excel csv file looks like this:
Data,test,04-12-2020 13:11,0,"8,2",1,2,3
Currently I use the following code to seperate the columns:
[~,~,dataCGM] = xlsread('file.csv');
outCGM = regexp(dataCGM, ',', 'split');
outCGM = outCGM(2:end-1);
This does split the columns on commas but also does so for the string "8,2" which is not what I want. Does anyone know how to prevent this issue and keep the value as a string in a single column?
Answers (2)
Cris LaPierre
on 13 Dec 2020
20 Comments
Tycho Maas
on 13 Dec 2020
Edited: Tycho Maas
on 13 Dec 2020
This does not work, since I have a csv file and no text files.
Tycho Maas
on 13 Dec 2020
I already use this method in my code as displayed above. It generates a cell with multiple rows and only a single column, each consisting of something like this:
Data,test,04-12-2020 13:11,0,"8,2",1,2,3
And I would like the data to be stored in an array with each of those elements in seperate columns. So it does not work.
Cris LaPierre
on 13 Dec 2020
It worked on the sample you have provided. I ony need xlsread, not the rest of your code.
[~,~,dataCGM] = xlsread("file.csv")

What is the desired end result?
Perhaps share your file so we can see what you see? Use the paperclip icon to attach it to your post.
Tycho Maas
on 13 Dec 2020
The output as you show is indeed my desired output, however for my data this does (attached) not work.
Cris LaPierre
on 13 Dec 2020
There is something wrong with your csv file format. I've recreated it.
I see you told Walter you can't predefine anything about data types. That's a shame, as that's is a strength of MATLAB. With that in mind, here's a minimalist approach that recognizes every field (every comma) w/o making any assumptions about the data it contains.
opts = detectImportOptions('File_MATLAB_test.csv');
opts.TrailingDelimitersRule = 'keep';
opts.ConsecutiveDelimitersRule = 'split';
data = readtable('File_MATLAB_test.csv',opts,"ReadVariableNames",false)
data = 2x19 table
Var1 Var2 Var3 Var4 Var5 ExtraVar1 ExtraVar2 ExtraVar3 ExtraVar4 ExtraVar5 ExtraVar6 ExtraVar7 ExtraVar8 ExtraVar9 ExtraVar10 ExtraVar11 ExtraVar12 ExtraVar13 ExtraVar14
________ ________ ____________________ ____ _______ __________ __________ __________ __________ __________ __________ __________ __________ __________ __________ __________ __________ __________ __________
{'Data'} {'test'} {'04-12-2020 12:24'} 0 {'8,2'} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char}
{'Data'} {'test'} {'04-12-2020 12:41'} 0 {'5,9'} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char}
Tycho Maas
on 13 Dec 2020
This gives the following error:
Unrecognized property 'TrailingDelimitersRule' for class 'matlab.io.text.DelimitedTextImportOptions'.
Any idea why? Maybe it's due to you using MATLAB 2020B while I use 2018A?
Cris LaPierre
on 13 Dec 2020
Yes, you can see in the top right corner of my post that the code was run in R2020b. There is no TrailingDelmitersRule in detectImportOptions in R2018a.
Tycho Maas
on 13 Dec 2020
I've tried it in the R2020b and it almost works. It works correctly for the first row (in hcich is said what belongs in which column). However for the next rows it does not work and everything is stored in the first column.
Cris LaPierre
on 13 Dec 2020
Somehow the file you are using is different from what you've posted. Can you share your actual file?
Tycho Maas
on 13 Dec 2020
I can't share the exact data since it contains personal info, however the file looks like this.
Cris LaPierre
on 13 Dec 2020
Are you sure? Each line is encased in quotes. That is probably where we are different.
Your file:
"FreeStyle LibreLink,data,04-12-2020 12:24,0,""5,3"",,,,,,,,,,,,,,"
"FreeStyle LibreLink,data,04-12-2020 12:41,0,""4,9"",,,,,,,,,,,,,,"
What I would expect:
Data,test,04-12-2020 12:24,0,"8,2",,,,,,,,,,,,,,
Data,test,04-12-2020 12:41,0,"5,9",,,,,,,,,,,,,,
If the raw file actually has the quotes, that will change things. Open it in a text editor to check, not Excel.
Tycho Maas
on 13 Dec 2020
Indeed, it looks like this in the text editor:
Apparaat,Serienummer,Tijdstempel apparaat,Gegevenstype,Historische glucose mmol/l,Scan Glucose mmol/l,Niet-numerieke snelwerkende insuline,Snelwerkende insuline (eenheden),Niet-numeriek voedsel,Koolhydraten (gram),Koolhydraten (porties),Niet-numerieke langwerkende insuline,Langwerkende insuline (eenheden),Notities,Strip Glucose mmol/l,Keton mmol/l,Maaltijdinsuline (eenheden),Correctie insuline (eenheden),Wijzigen insuline gebruiker (eenheden)
"FreeStyle LibreLink,data,04-12-2020 12:24,0,""4,9"",,,,,,,,,,,,,,"
"FreeStyle LibreLink,data,04-12-2020 12:41,0,""4,9"",,,,,,,,,,,,,,"
Cris LaPierre
on 14 Dec 2020
Edited: Cris LaPierre
on 14 Dec 2020
You probably won't believe how much trouble a little quote can cause. I don't love it, but this works.
Maybe someone like @Stephen Cobeldick, who is a regexp ninja, can improve on this.
fid = fopen("Test_file.csv");
% capture variable names
str=fgetl(fid);
varnames = textscan(str,'%s','Delimiter',',');
% capture the remaining file contents (assumed to be uniform)
raw = textscan(fid,'%q','Delimiter',',');
fclose(fid);
% Split the raw data by delimiter, keeping quoted text together
ss = @(C) strsplit(C,'(?!\<"[^"]*),(?![^"]*"\>)','CollapseDelimiters',0,'DelimiterType','RegularExpression');
M=cellfun(ss,raw{1},'UniformOutput',false);
% convert to a table
T = cell2table(M);
% Make each column its own variable. Name columns using variable names from file.
T=splitvars(T,1,"NewVariableNames",varnames{1})
T = 2x19 table
Apparaat Serienummer Tijdstempel apparaat Gegevenstype Historische glucose mmol/l Scan Glucose mmol/l Niet-numerieke snelwerkende insuline Snelwerkende insuline (eenheden) Niet-numeriek voedsel Koolhydraten (gram) Koolhydraten (porties) Niet-numerieke langwerkende insuline Langwerkende insuline (eenheden) Notities Strip Glucose mmol/l Keton mmol/l Maaltijdinsuline (eenheden) Correctie insuline (eenheden) Wijzigen insuline gebruiker (eenheden)
_______________________ ___________ ____________________ ____________ __________________________ ___________________ ____________________________________ ________________________________ _____________________ ___________________ ______________________ ____________________________________ ________________________________ __________ ____________________ ____________ ___________________________ _____________________________ ______________________________________
{'FreeStyle LibreLink'} {'data'} {'04-12-2020 12:24'} {'0'} {'"5,3"'} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char}
{'FreeStyle LibreLink'} {'data'} {'04-12-2020 12:41'} {'0'} {'"4,9"'} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char}
Tycho Maas
on 14 Dec 2020
Thank you very much!
Some small issue is still left though. My variable names are not in the first row but in the second (the first row contains non-important info). Now the code gives an error due to this I think (and so it does not recognize the second row as variable names). Do you have any idea how to fix this?
The error I get:
Error using tabular/addvars (line 143)
Specify one new variable name for each variable being added to the table.
Error in tabular/splitvars (line 161)
b = addvars(b,newvars{:},'Before',newInds(ii),'NewVariableNames',newvarnames);
Error in Untitled4 (line 14)
T = splitvars(T,1,"NewVariableNames",varnames{1})
Stephen23
on 14 Dec 2020
Edited: Stephen23
on 14 Dec 2020
"Maybe someone like @Stephen Cobeldick, who is a regexp ninja, can improve on this."
Thank you for the unique commendation.
Although it is probably not the fastest approach, I would try importing the entire file as one string, apply some string manipulation to it to remove the line-end quotation marks (e.g. REGEXPREP), and then write a new file which can then be directly imported using READTABLE. That has the benefit of importing all the different data classes correctly without much overhead and all of the standard READTABLE options.
It is not trivial because of course valid quotes around a string should not be removed.
This issue pops up enough to indicate that it would be nice for it to be handled natively:
Perhaps it would be a useful addition for READTABLE et al to include an option named e.g. LINEQUOTE which can be set to the required character (by default empty).
Cris LaPierre
on 14 Dec 2020
I can only make it work for what I see.
You can look into what settings are available from detectImportOptions. I suspect the NumHeaderLines is what you are looking for.
Walter Roberson
on 13 Dec 2020
readtable() with a format that is
'%s,%s,%{dd-MM-uuuu HH:mm}D,%f,%q,%f,%f,%f'
2 Comments
Tycho Maas
on 13 Dec 2020
Thanks but the code needs to work on itself without predefining what will be in which column.
Image Analyst
on 13 Dec 2020
That makes no sense. A program will not "work on itself". You need to tell your code HOW to process the file. It won't magically figure it out. Attach your csv file if you need more help.
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Seleccione un país/idioma
Seleccione un país/idioma para obtener contenido traducido, si está disponible, y ver eventos y ofertas de productos y servicios locales. Según su ubicación geográfica, recomendamos que seleccione: .
También puede seleccionar uno de estos países/idiomas:
Cómo obtener el mejor rendimiento
Seleccione China (en idioma chino o inglés) para obtener el mejor rendimiento. Los sitios web de otros países no están optimizados para ser accedidos desde su ubicación geográfica.
América
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia-Pacífico
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)