How to solve problem with smaller number of records read from datastore ?

Question

Camilius el 14 de Dic. de 2015

1
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/260298-how-to-solve-problem-with-smaller-number-of-records-read-from-datastore

Comentada: Wesley el 18 de Jun. de 2025

I am using datastore to read data from csv file which has over 7 million records. The problem occurs when I set ReadSize field to 500000 or 1000000 and after making single read I get only about 100000 records. I get this issue in Matlab 2014b and in 2015a. Where could be the problem?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Omar Qallaa el 9 de Feb. de 2017

3
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/260298-how-to-solve-problem-with-smaller-number-of-records-read-from-datastore#answer_254077

Editada: Omar Qallaa el 9 de Feb. de 2017

Abrir en MATLAB Online

I have a very similar problem. Similar to this case, I'm reading a very large single csv file that has ~100M records of just one feature. Setting "ReadSize" does not guarantee that the number of returned records after "read" is constant. I solved this as follows:

while hasdata(ds)
  data = read(ds);
  if (size(data,1) ~= requiredSamples) && hasdata(ds)
      % Change read size to number of missing samples
      ds.ReadSize = requiredSamples - size(data,1);
      tmp = read(ds);
      data = vertcat(data,tmp);
      % Set read size back to requiredSamples
      ds.ReadSize = requiredSamples;
  end
end

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

thomassimm el 4 de Abr. de 2019

Ok,

The 100M confuses me because the max data I can get is ~30k lines, presumably due to the 32 MB mentioned below. So if requireedSample>30000*2 then it won't work for me. Mine are 80k+ lines.

Wesley el 18 de Jun. de 2025

Thanks so much, Omar! This seems to be working for me pretty well! I am running a large batch process and the reads started to reduce in size. I am hoping this will hold. If not, I may need to clear multiple variables from the workspace as a backup.

Iniciar sesión para comentar.

Answer 2

Aaditya Kalsi el 15 de Dic. de 2015

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/260298-how-to-solve-problem-with-smaller-number-of-records-read-from-datastore#answer_203272

I believe that datastore has an upper limit of the amount of data read from a file at once. I believe you are running up against this. Also, the 'ReadSize' property is an upper limit, so technically, this could be expected behaviour.

Is there a reason you require the exact number of rows? Does your algorithm depend on this?

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Aaditya Kalsi el 17 de Dic. de 2015

Editada: Aaditya Kalsi el 17 de Dic. de 2015

Abrir en MATLAB Online

I believe that is expected. My suspicion is that you are reading multiple files and each file has only up to about a 1000 records, or you have a large file where each record is very large.

I quote the documentation below. The link is here .

I'm quoting the documentation here:

ReadSize — Amount of data to read
20000 (default) | positive scalar | 'file'
Amount of data to read in a call to the read function, specified as a positive scalar or the string, 'file'.
If ReadSize is a positive integer, then each call to read reads up to the specified number of rows from the datastore.
If ReadSize is 'file', then each call to read reads all of the data in one file.
When you change ReadSize from a numeric scalar to 'file' or vice versa, MATLAB resets the datastore to the state where no data has been read from it.

Camilius el 21 de Dic. de 2015

Recently I examined source code of datastore, and there is one hardcoded feature which describes limit for a chunk of data. The limit is 32 MB. I am reading single file, that has about 7 million records, one record has 43 features. I think this 100000 record limit is caused by this 32 MB limit.

Iniciar sesión para comentar.

How to solve problem with smaller number of records read from datastore ?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (2)

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

How to solve problem with smaller number of records read from datastore ?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (2)

4 comentarios Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

4 comentarios Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos