standard deviation takes for ever

5 visualizaciones (últimos 30 días)
gujax
gujax el 12 de Sept. de 2023
Comentada: dpb el 13 de Sept. de 2023
I have a double precision numeric 3D matrix M (converted by fread from uint8) of size 30000 x 500 x 500 I would like to get standard deviation along dimension 2 tic, std(M,0,2) ; toc has taken more than 12 hours and still running meanwhile mean(M,2) only took 80 seconds.
Or a bit more details.. std(M(:,:,1),0,2) takes 0.3 seconds and std(M(:,:,1:100),0,2) takes 34 seconds But std(M(:,:,1:500),0,2) says out of memory
Similarly mean(M(:,:,1),2) takes 0.1 seconds But mean(M(:,:,1:500),2) does not work and gives me 'out of memory' message But mean(M,2) takes about 80 seconds. This is all very confusing! Thanks
  7 comentarios
dpb
dpb el 12 de Sept. de 2023
Your original posting says "I have a double precision numeric 3D matrix M of size 30000 x 500 x 500..."
That's what I calculated above at 8 bytes/double takes up 59 GB storage.
I don't follow what " an accumulation of (500 x 100x 5) files each 31 KB in size." means?
Think you're going to have to show us specifically what your array is and how it was constructed.
gujax
gujax el 12 de Sept. de 2023
Editada: gujax el 13 de Sept. de 2023
Ah got it!
I append 100 x 500 x 500 times a 31 KB time series streaming data chunk into one file instead of generating 5 million separate write files.
So that’s about ~8GB data
But when I read it I didn’t quite realize by default fread converts it to double

Iniciar sesión para comentar.

Respuesta aceptada

gujax
gujax el 13 de Sept. de 2023
calculating statistical std takes more memory than calculating mean. If performing std on double formatted large data sets, it likely will slow down the computer if memory is limited. That may not be true for evaluating statistical mean.

Más respuestas (1)

Steven Lord
Steven Lord el 12 de Sept. de 2023
Can you confirm you're using the std function included in MATLAB? What does this command show?
which -all std
/MATLAB/toolbox/matlab/datafun/std.m /MATLAB/toolbox/matlab/datatypes/tabular/@tabular/std.m % tabular method /MATLAB/toolbox/matlab/datatypes/datetime/@datetime/std.m % datetime method /MATLAB/toolbox/matlab/datatypes/duration/@duration/std.m % duration method /MATLAB/toolbox/matlab/timeseries/@timeseries/std.m % timeseries method /MATLAB/toolbox/matlab/bigdata/@tall/std.m % tall method /MATLAB/toolbox/parallel/parallel/@distributed/std.m % distributed method
  9 comentarios
gujax
gujax el 13 de Sept. de 2023
Editada: gujax el 13 de Sept. de 2023
I think I will state this issue resolved? i.e., calculating statistical std takes more memory than calculating mean. If performing std on double formatted large data sets, it likely will slow down the computer if memory is limited. That may not be true for evaluating statistical mean.
dpb
dpb el 13 de Sept. de 2023
The issue you're having must be in disk swapping owing to limited real memory...I'm still not positive about just how big your array is. How about
whos M
? to tell us precisely what you've processing and
memory
for the available memory your machine has?
It depends on how TMW builds the executable and what processor instructions they assume; unfortunately, it's likely they code to a "lower common denominator" of what is out there because know that not all customers are going to have latest CPU technology with enhanced vector processing instructions making use of builtin vector pipeline that exists with current processors.
I've never messed with trying it out, if you have a high-memory graphics card, you could possible try the GPU stuff...

Iniciar sesión para comentar.

Categorías

Más información sobre Logical en Help Center y File Exchange.

Productos


Versión

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by