speed up computation of huge matrix?

Hi,
I need to compute the mean of a huge matrix (20000*20000*time) at several times, Is there any way to speed up the computation when calculating this?
Cheers

Respuestas (4)

Jan
Jan el 14 de Oct. de 2011

2 votos

Yes, or no. It depends.
An arry of the size [20000 x 20000 x time] occupies 3.2 GB, if time is 1 and for type DOUBLE. If time is larger, more memory is needed - in contiguous free blocks.
Calculating the MEAN over the 1st dimension is much faster than over the 2nd one, because the memory is processed in contiguos blocks.
Using INT16 values would reduce the memory footprint, but the values must be integer then.
Without any more detailed description, a more specific answer is impossible.
Edric Ellis
Edric Ellis el 14 de Oct. de 2011

2 votos

As Jan points out, this is very large data. One option is to use distributed arrays with Parallel Computing Toolbox (and MATLAB Distributed Computing Server to spread the computation over the memory of multiple machines). But it may be better yet not to have all of your data in memory at one time.
If you have a capable GPU, you can use gpuArrays. On my NVIDIA C1060, I can calculate the mean of 20000x20000 doubles in 0.04 seconds vs. 0.2 seconds on my CPU (R2011b). A more recent Tesla card such as the C20xx family would probably out-perform this.

5 comentarios

Jan
Jan el 14 de Oct. de 2011
0.04 second including or excluding the time to transport the 3.2GB to the graphicsboard?
Edric Ellis
Edric Ellis el 17 de Oct. de 2011
Yes, that's right, I wasn't timing the host-GPU transfer.
Jan
Jan el 17 de Oct. de 2011
And how fast or slow is it including the transfer? I'm just curious, because I do not have a powerful graphics card yet.
Edric Ellis
Edric Ellis el 17 de Oct. de 2011
Building a 3.2GB gpuArray from a CPU array takes about 2.0 seconds on my system. This is why it's important to keep data on the GPU as long as possible, and preferably build it there (there are zeros/ones etc. functions which can build a 3.2GB gpuArray in 0.025 seconds; rand on the GPU is somewhat slower, but still quicker than building on the host and transferring).
Jan
Jan el 17 de Oct. de 2011
@Edric: Thanks for this summary. I've upgraded from a Pentium-M to a dual-core some month ago. Now my experiments with mutli-threading are less abstract. CUDA is the next step.

Iniciar sesión para comentar.

park minah
park minah el 14 de Oct. de 2011

1 voto

I think you'd better use 'parallel computing'. It will be make your process speedy, maybe.
Thomas
Thomas el 21 de Oct. de 2011

1 voto

Use a GPU card and gpuArray. Our Tesla M2070-Q (448 cores, 6GB) performs this in 0.02125 sec.
I have performed a similar (50000*50000) *100 in less than 2 seconds..

Categorías

Más información sobre Parallel Computing en Centro de ayuda y File Exchange.

Preguntada:

el 14 de Oct. de 2011

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by