readtable cannot handle double quotation marks very well

16 visualizaciones (últimos 30 días)
Kouichi C. Nakamura
Kouichi C. Nakamura el 6 de En. de 2021
Editada: Kouichi C. Nakamura el 7 de En. de 2021
I have CSV files saved with LibreOffice with text flanked by double quotation marks (Format quoted field as text).
When I tried to read one of such CSV with two rows with readtable,
T0 = readtable('file1.csv',...
'Encoding','UTF-8','delimiter',',','ReadVariableNames',true);
readtable failed to read the first row,
Then I used this command and it can read both rows.
opts1 = delimitedTextImportOptions('Encoding','UTF-8','Delimiter',',','DataLines',[2 Inf],'VariableNamesLine',1);
T1 = readtable('file1.csv',opts1);
However, the content of table wasn't great:
ans = 2×1 cell
'"optotagging"'
'"behaviour"'
The double quotation marks remained in some columns.
setvaropts' option 'QuoteRule','remove' appeared to be promissing, but I could not get it work.
setvaropts(opts1,'QuoteRule','remove')
How do I nicely remove double quotation marks in CSVs?

Respuestas (1)

Kouichi C. Nakamura
Kouichi C. Nakamura el 6 de En. de 2021
Editada: Kouichi C. Nakamura el 7 de En. de 2021
I asked this to Mathworks and their answer was helpful:
opts = detectImportOptions('file1.csv','NumHeaderLines',0,'Delimiter',',') %will almost work for this case, but it detects the first line as a "meta-data" line because it is all string/blank
opts.DataLines = [2,inf] %will work around that issue
T2 = readtable('file1.csv',opts);
With this code, I can read both rows and remove double quotation marks nicely.
According to Mathworks:
> The solution shared, is very specific to your workflow and is an undocumented method which might change without notice.

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by