count number of rows in csv outside of matlab

Question

Alexandra McClernon Ownbey el 26 de Feb. de 2021

1
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/756529-count-number-of-rows-in-csv-outside-of-matlab

Respondida: Walter Roberson el 26 de Feb. de 2021

I have 10000+ csv files I would like to import into matlab. I only need the data from the first and last rows for inlet and exit conditions. Each csv file has a different number of data points, so I do not know the length of the file imported a priori. I am trying to automate the import process. I can automate importing all the data or specific lines, but I do not know how to import the last row. The only way I can think of is to determine the number of rows in the file without importing the data (importing all the data takes a few hours) and import that row specifically. Does anyone know how I can do this? I have tried messing with textscan, but I have not had any luck.

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Walter Roberson el 26 de Feb. de 2021

Is there an upper limit on the number of characters per line in your csv files? For example are the lines more than 1 kilobyte each?

Alexandra McClernon Ownbey el 26 de Feb. de 2021

I am using windows. I am not sure what the limit/line is. There are 8 columns in all files. The first row are the titles, the remaining rows are numbers with >8 figures in each cell

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Jeremy Hughes el 26 de Feb. de 2021

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/756529-count-number-of-rows-in-csv-outside-of-matlab#answer_634339

Abrir en MATLAB Online

Turns out this is not as easy of a question as you might think, especially if your CSV contains data that might be double-quoted and that data contains a new line character.

e.g. "This data\nhas a new line","but this doesn't"

The number of lines might not really be that important, depending on what you're trying to do. But I don't know what that is.

If you're trying to avoid having all the data in memory at one time, I suggest reading up on tabularTextDatastore, as that helps automate working with large sets of data. There's a rich set of features you can use with datastores to make working with larger datasets eaiser, tall, transform, combine. None of these assume to know the size of each table. But again, without knowing what you plan do to with those files, that's hard to say if you can use it.

---- but to answer the original question ----

If you don't have any double-quoted data, it gets a lot easier to count lines of a CSV file. This code will scan the lines without importing any of the data. (It reads the file internally, but doesn't generate any output--thats what the %*... formats are about)

fid = fopen(filename);
numLines = 0;
while ~feof(fid)
    [~,c]=textscan(fid,'%*[^\r\n]%*[\r\n]',1,'Delimiter','','Whitespace','','EndOfLine','');
    if c > 0 % if c==0, then there wasn't a line there. this may happen at the end of the file.
        numLines = numLines + 1;
    end
end
fclose(fid);

If you want the data for the lines, this newer function should help:

https://www.mathworks.com/help/matlab/ref/readlines.html

---- if you have double-quoted strings with new lines (or don't know if you do) ----

If you want to count the actual lines and not count the ones in double quoted fields, then you really need to parse each CSV line and find the fields with double-quotes, and at that point you might as well be importing the data, but if you really just want to count the parsed lines, you can do something with importOptions.

This code will import only the first variable in the table as a string, but to do that, it will still parse the file and consider the quoted data that appears later in the line. It will be slower than the method above, but robust if you have quoted data that contains newlines. It will be faster than bringing in the whole table.

opts = delimitedTextImportOptions('Delimiter',',','ExtraColumnsRule','ignore','VariableTypes',"string");
T = readtable(filename,opts);
numLines = height(T);

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 2

KSSV el 26 de Feb. de 2021

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/756529-count-number-of-rows-in-csv-outside-of-matlab#answer_633579

Abrir en MATLAB Online

csvFiles = dir('*.csv') ; 
N = length(csvFiles) ; 
f = cell(N,1) ; % first row 
l = cell(N,1) ; % last row
for i = 1:N
    data = csvread(csvFiles(i).name) ; 
    f{i} = data(1,:) ;
    l{i} = data(end,:) ; 
end

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Alexandra McClernon Ownbey el 26 de Feb. de 2021

Editada: Alexandra McClernon Ownbey el 26 de Feb. de 2021

I am trying to read in the data without reading in the entire table. I already have a script that can read in all the csv files in separate folders. csvread takes too long. I know I can import the data into cells or a 3D matrix and then select the points I want, but this is a very round-about way of doing it.

Iniciar sesión para comentar.

Answer 3

Walter Roberson el 26 de Feb. de 2021

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/756529-count-number-of-rows-in-csv-outside-of-matlab#answer_634364

Abrir en MATLAB Online

csvdir = 'appropriate_directory_name';  %use '.' for current directory
csvFiles = dir(fullfile(csvdir, '*.csv'));
filenames = fullfile({csvdir.folder}, {csvdir.name});
N = length(csvFiles) ; 
f = cell(N,1) ; % first row 
l = cell(N,1) ; % last row
for K = 1:N
    thisfile = filenames{K};
    [fid, msg] = fopen(thisfile, 'r');
    if fid < 0
        fprintf('failed to open file "%s" because "%s", ignoring it\n', thisfile, msg);
        next
    end
    fgetl(fid);   %skip header
    f{i} = cell2mat(textscan(fgetl(fid), ''));  %first line
    %data is 8 columns. We can be sure that columns are < 25 characters each
    fseek(fid, 256, 'eof');   %move to near end of file
    fgetl(fid);   %we positioned to middle of line, discard to end of line
    %look for the last non-empty line
    old_line = '';
    while ~feof(fid)
        new_line = fgetl(fid);
        if ~ischar(new_line); break; end  %EOF
        if ~isempty(strtrim(new_line))
            old_line = new_line;
        end
    end
    fclose(fid)
    l{i} = cell2mat(textscan(old_line, ''))
end

What this code is doing is opening each file, skipping a header line, reading the next line and converting it to numeric. Then it seeks to before the end of file and reads lines, discarding empty lines, including empty lines that occur at end of file, keeping the last non-empty line it finds, and converting the last non-empty line to numeric.

The code seeks to 256 characters before the end of file, skipping the rest of the file -- literally not reading it as much as is possible with the operating system. Why 256? Because it is a "nice round number" to computer scientists ;-) If the data was output as double precision, then it could take as many as 25 characters per entry such as '-6.32359246225409463e+110' plus the comma delimiter, maybe a space as well, so possibly 27*8+2 characters = 218 characters for the line. Using 256 gives a bit of slack in case we miscounted or there is something odd in the file.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

count number of rows in csv outside of matlab

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (2)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

count number of rows in csv outside of matlab

5 comentarios Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (2)

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos