Computational Complexity of matrix multiplication

22 visualizaciones (últimos 30 días)
Jayant chouragade
Jayant chouragade el 25 de Sept. de 2020
Editada: James Tursa el 25 de Sept. de 2020
Hi
Please suggest which of follwing will have higher computaional complexity.
  1. A*xCxC*xA
  2. A*xRxR*xA
Where, A = MxN complex vakued matrix, C =NxN complex vakued matrix and R=NxN real valued matrix.
x stands for matrix multiplication.
* stands for Hermitian transpose.
regards
jayant

Respuesta aceptada

James Tursa
James Tursa el 25 de Sept. de 2020
Editada: James Tursa el 25 de Sept. de 2020
This question is perhaps more involved than it looks on the surface, because by default MATLAB doesn't store the imaginary part of a real variable and MATLAB can call BLAS symmetric matrix multiply routines in the background to do the job in some cases. And the BLAS library is highly optimized for multi-threading and cache usage. So depending on the complexity of the inputs and your version of MATLAB, there may be data copies involved that you didn't realize. And depending on the order of operations, MATLAB may or may not be able to call those BLAS symmetric matrix multiply routines which run in about 1/2 the time of the generic matrix multiply routines. E.g., for R2018a or later (interleaved complex memory model):
D = A' * C * C' * A
will be done as ((A' * C) * C') * A. Because of this order, MATLAB will not recognize the symmetry and will not make use of the BLAS symmetric matrix multiply routines. There are three generic matrix multiplies involved. Because of floating point effects, the result may not be strictly Hermitian either.
D = (A' * C) * (C' * A)
will be done as three generic matrix multiplies. Because of this order, MATLAB will likely not recognize that A' * C is the Hermitian transpose of C' * A, so you just get the generic matrix multiplies in the background. Because of floating point effects, the result may not be strictly Hermitian either.
ATC = A' * C
D = ATC * ATC'
will be done as a generic matrix multiply followed by a symmetric matrix multiply. MATLAB is able to recognize the symmetry of the second multiply in this case. This would be the best performance, and the result is guaranteed to be strictly Hermitian.
D = A' * R * R' * A
will be done as ((A' * R) * R') * A. Because of this order, MATLAB will not recognize the symmetry and will not make use of the BLAS symmetric matrix multiply routines. There are three generic matrix multiplies involved. On top of that, the R matrix will first have to be copied into interleaved complex format (imaginary part 0) before the complex matrix multiply routines can be called. So you get that additional burden, twice in fact, but this may be somewhat offset by the speed of doing a lot of 0 multiplies. Because of floating point effects, the result may not be strictly Hermitian either.
D = (A' * R) * (R' * A)
will be done as three generic matrix multiplies. Because of this order, MATLAB will likely not recognize that A' * R is the Hermitian transpose of R' * A, so you just get the generic matrix multiplies in the background. On top of that, the R matrix will first have to be copied into interleaved complex format (imaginary part 0) before the complex matrix multiply routines can be called. So you get that additional burden, twice in fact, but this may be somewhat offset by the speed of doing a lot of 0 multiplies. Because of floating point effects, the result may not be strictly Hermitian either.
ATR = A' * R
D = ATR * ATR'
will be done as a generic matrix multiply followed by a symmetric matrix multiply. MATLAB is able to recognize the symmetry of the second multiply in this case. On top of that, the R matrix will first have to be copied into interleaved complex format (imaginary part 0) before the complex matrix multiply routines can be called. So you get that additional burden, but only once, and this may be somewhat offset by the speed of doing a lot of 0 multiplies. This would be the best performance, and the result is guaranteed to be strictly Hermitian.
Comparing using C vs using R, you are basically trading doing generic complex multipies vs doing a data copy and then 1/2 of the operations are 0 multiplies. The R may or may not be faster because of this, but could easily be slower and will likely involve a temporary memory bump.
R2017b and earlier (separate complex memory model):
In these versions MATLAB stores the real and imaginary parts separately, so you do not get that extra data copy of R prior to the complex matrix multiplies, and you avoid all of the resulting 0 multiplies. The complex*real matrix multiply is actually done in pieces, and overall operation takes about 1/2 the operations of a complex*complex matrix multiply. This is a big difference between what happens in these cases between R2017b and earlier vs R2018a and later.
E.g., for R2018a or later interleaved memory model:
>> version
ans =
'9.8.0.1323502 (R2020a)'
>> A = rand(5000,4000)+rand(5000,4000)*i;
>> C = rand(5000,5000)+rand(5000,5000)*i;
>> R = rand(5000,5000);
>> tic;ATC=(A' * C); DS = ATC * ATC';toc
Elapsed time is 6.282376 seconds.
>> tic;ATR=(A' * R); DS = ATR * ATR';toc
Elapsed time is 6.475804 seconds.
And for R2017b or earlier separate memory model:
>> version
ans =
9.0.0.370719 (R2016a)
>> A = rand(5000,4000)+rand(5000,4000)*i;
>> C = rand(5000,5000)+rand(5000,5000)*i;
>> R = rand(5000,5000);
>> tic;ATC=(A' * C); DS = ATC * ATC';toc
Elapsed time is 7.422857 seconds.
>> tic;ATR=(A' * R); DS = ATR * ATR';toc
Elapsed time is 4.870730 seconds.
There is no advantage using R in the interleaved complex memory model (R2020a in this case) ... in fact I got a slight decrease in performance due to the temporary data copy. But using R in the separate complex memory model (R2016a in this case) has a clear advantage by avoiding the temporary data copy and a lot of those 0 multiplies.
  3 comentarios
James Tursa
James Tursa el 25 de Sept. de 2020
Editada: James Tursa el 25 de Sept. de 2020
No. The memory model for a particular MATLAB version is fixed and cannot be changed. If you are using a R2018a or later version you will get the interleaved complex memory model and this cannot be changed. If you are doing a lot of complex*real matrix multiply operations, the only thing that might help you is to force the real matrices to be stored as complex (see the complex( ) function) ahead of time and eat the cost of the memory bump up front. Depending on the operations you are doing, this might avoid doing the same data copies (real to complex) over and over again downstream in your code.
Walter Roberson
Walter Roberson el 25 de Sept. de 2020
Constant multiples in the number of operations are irrelevant to computational complexity.
Computational complexity is only concerned with the dominant variables in the number of operations that would be required in an abstract machine. Computational complexity studies have shown that general matrix multiplication can be done in no more than time and mathematicians suspect that there might just be an algorithm. But the algorithm is slower than regular algorithms for any matrix that is feasible to work with using current technology. One of the algorithms is apparently feasible with current technolgies... but only for rather large matrices.
Real technologies have to work with CPU caching, and because of that, a straight-forward algorithm is in practice about 100 times slower than the algorithms used by BLAS that use block multiplies that use operations plus some extra fix-up operations. Block multiplication has a higher number of theoretical floating point operations than a straight-forward algorithm, but operations much faster in real world computers.
Theoretical floating point operations are not a great way to estimate real-world computation. Real-world CPUs have Single Instruction Multiple Data extensions for a number of floating point patterns, that can carefully manage stages of computation to overlap operations. You might, for example, be able to get the first result after 3 clock cycles and one additional result per additional clock cycle, getting out 4 results in 8 cycles (average 2 cycles per) instead of needing 3*4 = 12 cycles. It is common for CPUs to have two floating point cores that can work independently, so after 8 cycles you might have 8 results, with the first result not arriving until the 4th cycle. Computational complexity doesn't take any of that into account.

Iniciar sesión para comentar.

Más respuestas (1)

Walter Roberson
Walter Roberson el 25 de Sept. de 2020
Editada: Walter Roberson el 25 de Sept. de 2020
The computational complexity is the same. Multiplication of two complex numbers is a constant multiple of operations compared to real values, and computational complexity does not care about constant multiples.
In theory you could have a hardware instruction that did multiplication of complex numbers and you would not worry about the implementation details any more than you would worry about the fact that digital multiplication is more work than digital addition

Categorías

Más información sobre Math Operations en Help Center y File Exchange.

Productos


Versión

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by