How do I determine the number of data in a 'messy' CSV and then import it?

Question

Daniel Bridges el 22 de En. de 2018

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/378264-how-do-i-determine-the-number-of-data-in-a-messy-csv-and-then-import-it

Editada: Daniel Bridges el 23 de En. de 2018

I have a CSV file that looks like the following:

"","JOHN DOE","p0123TP01, Approved"
"","Rectum (4)(Volume: 57.77)","CTV (21)(Volume: 32.39)"
"0.0","100.0","100.0"
"0.1","78.01999","100.0"
"0.2","76.2245","100.0"
"0.3","75.21035","100.0"
[skipping to the end of the file]
"58.1","","1.88277"
"58.2","","0.21338"
"58.3","","0.0"

but the number of data sets can vary, so I must write a flexible script to accomodate however many data rows and columns exist:

"","DOE, JOHN","revised blurring for Case 16, ANON, original; calculating correlation between blurred dose and toxicity","","","","","","","","","","","","","",""
"","patient (1)(Volume: 18578.06)","Prostate (2)(Volume: 19.80)","Proximal-SV (3)(Volume: 6.40)","Rectum (4)(Volume: 70.44)","Bladder (5)(Volume: 143.70)","Intestine (6)(Volume: 8.40)","CutLine (7)(Volume: 415.77)","Air (8)(Volume: 5.06)","InsideRectum (9)(Volume: 36.17)","PTV1 (10)(Volume: 76.77)","PTVoriginal (11)(Volume: 80.89)","PTV2 (12)(Volume: 56.52)","Guide1 (13)(Volume: 81.79)","Guide2 (14)(Volume: 68.50)","guide (15)(Volume: 66.91)","CTV-operator (16)(Volume: 26.15)"
"0.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
"0.1","12.88921","100.0","100.0","69.70608","73.36377","2.54878","82.87909","16.40558","73.91824","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
"0.2","12.06455","100.0","100.0","67.50839","69.03589","0.88936","79.92729","16.40558","71.92567","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
"0.3","11.56745","100.0","100.0","65.82671","67.72518","0.32793","77.7594","16.40558","70.44129","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
"0.4","11.09599","100.0","100.0","64.40629","66.56731","0.15624","75.78776","16.40558","69.10236","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
"0.5","10.1524","100.0","100.0","63.25727","65.87648","0.07211","73.65644","16.40558","68.12462","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
"0.6","9.48733","100.0","100.0","61.64318","64.92351","0.0103","67.57963","16.40558","65.96868","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
"0.7","9.1257","100.0","100.0","60.35885","62.43443","0.0","59.25954","16.40558","65.12147","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
[... skipping to end of file ...]
"58.1","0.00498","4.57181","0.23601","","","","","","","1.20431","1.143","1.63577","1.13037","1.34968","1.38174","3.51704"
"58.2","4.9E-4","0.46111","0.0","","","","","","","0.1189","0.11285","0.1615","0.1116","0.13326","0.13642","0.34903"
"58.3","0.0","0.0","","","","","","","","0.0","0.0","0.0","0.0","0.0","0.0","0.0"

What textscan formatSpec do I need to store this data? The following code results in a 1x1 cell array with element 0x2 empty double:

function [StudyIDcol,DistributionType,OrganSpecification,Doses,Volumes,DoseUnit,VolumeUnit] = CreateTableVariables(folderpath,type,StudyID)
% First we read the file. Code courtesy of Walter Roberson of the MathWorks community
   fid = fopen([folderpath,type,'/',StudyID,'.csv'],'rt'); % rt = 'read in text mode'
   if fid == -1 % added for debugging at the suggestion of Jan Hansen of the MathWorks community
      error('Author:Function:OpenFile', 'Cannot open file');
   end
   num = 0;
 %  headers = cell % unfinished
   while true %endless loop requiring 'break' command to exit!
     H1 = fgetl(fid) ;
     if feof(fid); break; end 
     H2 = fgetl(fid) ;
     if feof(fid); break; end 
     datacell = textscan(fid, '%f%f', 'Delimiter', ',', 'CollectOutput', true) ; % originally '%f%f'
     if isempty(datacell) || isempty(datacell{1}); break; end 
   %  if any(isnan(datacell{1}(end,:))); datacell{1}(end,:) = []; end % commented out to verify the above command quits the loop
     num = num + 1;
     headers(num,:) = {H1, H2} ;
     data(num) = datacell; % We store it to the data cell array only after checking that it contains data.
     %fgetl(fid);  %the empty line between organs
   end
   fclose(fid);

I am reading the textscan formatSpec documentation and more trying to solve this problem. Please help me understand how to use textscan.

I am now trying to write a script to flexibly import such files; I must learn how to scan it to count the number of rows and columns for the data beginning at "0.0","100.0","100.0" (not necessarily this number each time) ...

%function data = ReadMIMDVH(filepath)
filepath = '/home/sony/Documents/research/data/MIM Cumulative DVH/planned/1.csv';
% We write a script specifically to import MIM data.
% First we scan the file to see how many rows of data it has.
% We construct the format spec based on this number of rows and columns.
formatSpec = ['%*s %*s %*s %*s %q %q',repmat('%f',[rows,columns])];
% We read the file using textscan.
fileID = fopen(filepath);
DVH = textscan(fileID,formatSpec,'Delimiter',{',' '/n'});
%end

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Guillaume el 22 de En. de 2018

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/378264-how-do-i-determine-the-number-of-data-in-a-messy-csv-and-then-import-it#answer_301154

Abrir en MATLAB Online

I'd use readtable which should be able to work out the formatspec on its own. So just:

t = readtable(filepath); %all done file is read and parsed.

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Guillaume el 23 de En. de 2018

My next task is to learn how to parse the Volume variable names to strip the underscore and all characters after it.

A regexprep should do that very easily. Explain exactly what you want.

Daniel Bridges el 23 de En. de 2018

Editada: Daniel Bridges el 23 de En. de 2018

Abrir en MATLAB Online

Please see the question, " How do I parse and erase from a string while importing CSV with tableread?" The goal is to most efficiently import the volume structure names, e.g.

"Rectum (4)(Volume: 70.44)","Bladder (5)(Volume: 143.70)"

keeping only the names 'Rectum', 'Bladder', etc. I was able to find one method, but it took four lines of code, and I suspect there is a way to do it with one. If you would post a better method, I would unaccept my answer and accept yours.

Iniciar sesión para comentar.

Answer 2

Walter Roberson el 22 de En. de 2018

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/378264-how-do-i-determine-the-number-of-data-in-a-messy-csv-and-then-import-it#answer_301161

Abrir en MATLAB Online

opts = detectImportOptions(filepath);
nvar = length(opts.VariableNames);
opts.VariableNames = sprintfc('Var%d', 1:nvar);
test = readtable(filepath, opts);

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

How do I determine the number of data in a 'messy' CSV and then import it?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Más respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

How do I determine the number of data in a 'messy' CSV and then import it?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

5 comentarios Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Más respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos