Borrar filtros
Borrar filtros

Datastore readsize - unexpected behavior

2 visualizaciones (últimos 30 días)
Anders
Anders el 23 de Jun. de 2023
Comentada: Rik el 23 de Jun. de 2023
I would expect the code below to read 40k lines from my datastore at each pass but for reasons unkown to me the number of lines varies between the passes.
ds = tabularTextDatastore(filename,'ReadSize',40000);
c = 0;
while hasdata(ds)
c = c + 1;
TT = read(ds);
T = height(TT);
if c==1
t_total = T;
else
t_total = t_total + T;
end
disp("Done with " +t_total +" ticks.")
end
This procedes the output :
Done with 40000 ticks.
Done with 45096 ticks.
Done with 85096 ticks.
Done with 90190 ticks.
Done with 130190 ticks.
I would expect the increment to be 40k each time. The data is timestamped and based on the timestamp the data in the csv file "filename" does not seem to be corrupt in any way. That is, there are no missing timestamps when reading the data. Is there anything I can do so that I will get 40k lines at each pass (except the last pass of course) ?.
  3 comentarios
Anders
Anders el 23 de Jun. de 2023
Sorry, I should have been more careful with the code example. Fixed that now. The actual data I'm using is proprietary so I'm not allowed to share it. Would it be helpful with an example file with the same structure?
Rik
Rik el 23 de Jun. de 2023
Anything that reproduces this problem is fine. You care about the actual data, we don't. For this problem, the only thing that matters is that the data produces the same results.

Iniciar sesión para comentar.

Respuestas (1)

Sanskar
Sanskar el 23 de Jun. de 2023
Hi Anders!
What I understand from your question is that you want to read 40k lines from your datastore but you are getting random lines after first iteration of the loop.
'ReadSize' property which you are using call to read at most number of rows which is given as argument.
But 'hasdata' function doesn't guarantee that exactly 'ReadSize' number of rows will be passed.
Instead of 'hasdata' you can use 'isDone()' to check if all the data has been read from dataset.
Following is the modified code:
ds = tabularTextDatastore(filename, 'ReadSize', 40000);
c = 0;
while ~isDone(ds) % Use isDone instead of hasdata
c = c + 1;
if c == 1
t_total = T;
else
t_total = t_total + T;
end
data = read(ds); % Read exactly 40,000 lines at each pass
disp("Done with " + t_total + " ticks.")
end
Following are the link of dcumentation for isDone():
  1 comentario
Anders
Anders el 23 de Jun. de 2023
Editada: Anders el 23 de Jun. de 2023
Hi Sanskar,
I get an Unrecognized function or variable 'isDone'. Is isDone part of some toolbox? When I type which isDone I get a 'not found' message.
If I understand the documentation correctly isDone is used for system objucts and cannot be used with datastores.

Iniciar sesión para comentar.

Categorías

Más información sobre Datastore en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by