# Which one is faster: trace(a' * b) or sum(sum(a .* b', 2))?

4 views (last 30 days)
Xiaohan Du on 21 Jun 2018
Commented: John D'Errico on 21 Jun 2018
Hi all,
If I have 2 matrices with same size, I first perform SVD to each, then calculate the trace differently as follows:
clear; clc;
% this script tests what's the fastest way to compute trace with SVD
% vectors.
ma = rand(30000, 100); % large, rectangular matrices.
mb = rand(30000, 100);
[ua, sa, va] = svd(ma, 0);
[ub, sb, vb] = svd(mb, 0);
% case 1: trace(a' * b)
tic
for i = 1:1000
otptTr = trace((vb' * va) * sa' * (ua' * ub) * sb);
end
toc
% case 2: sum(sum(a.*b', 2))
tic
for i = 1:1000
otptSum = sum(sum(((vb' * va) * sa') .* ((ua' * ub) * sb)', 2));
end
toc
I'd like to know if these operations are repeated for many times, which one is faster? I thought case 2 would be faster, but my test results are:
Elapsed time is 8.242850 seconds.
Elapsed time is 8.267731 seconds
which is roughly the same.
Since trace only cares about the diagonal values, and the nth element in the diagonal is the scalar product of the nth row of A with the nth column of B, is there any ways faster by only computing the elements on the diagonal?
##### 2 CommentsShowHide 1 older comment
John D'Errico on 21 Jun 2018
Feel free to close the question again as you wish, as that would be your choice. I added an answer which explains the issues that you need to consider, and also showed how to use timeit, instead of tic and toc.

John D'Errico on 21 Jun 2018
Well, I could tell you, but then...
Seriously, I cannot tell you, because this is something I cannot know. Nor can anyone else really know, unless they have exactly the same computer that you have, set up the same way, same OS, etc.
You have already seen that it is quite close, almost insignificantly different. I would suggest that a better test is to start working with functions. LEARN TO USE FUNCTIONS, NOT SCRIPTS. Functions can be sometimes better optimized by MATLAB. MATLAB treats scripts differently than it does functions, since in the latter the parser can sometimes make optimizations that will speed up the code. Scripts don't get that treatment. And of course, these issues MAY change with every MATLAB release, or with possibly different versions of the BLAS, which could strongly impact your results, since this is all matrix related computation.
But that also gives you the capability to use timeit, instead of tic and toc. The timeit function tests these things much better, more accurately than tic and toc. Timeit worries about problems with warming up a function, making sure it is properly in the function cache. And of course, do NOT do things on the side while you test it, because then you take away processing capability from MATLAB. So avoid use of tic and toc if you can.
And, yes, I could do the same test for you, then use timeit. But I won't do so. Because my test would not be valid for you. I won't have the same CPU, the same version of MATLAB, possibly, the same OS release, the same amount of RAM, cache sizes, number of CPUs, etc. All these things MIGHT be pertinent.
Anyway, I might run the comparison like this:
ma = rand(30000, 100); % large, rectangular matrices.
mb = rand(30000, 100);
[ua, sa, va] = svd(ma, 0);
[ub, sb, vb] = svd(mb, 0);
% case 1: trace(a' * b)
timeit(@() trace((vb' * va) * sa' * (ua' * ub) * sb))
% case 2: sum(sum(a.*b', 2))
timeit(@() sum(sum(((vb' * va) * sa') .* ((ua' * ub) * sb)', 2)))
Again, be careful though. Since I do have multiple other processes running on my machine as I write this, along with my open web browser, I can get variable results. It looks like the second call may be slightly consistently faster, on MY machine, but by a difference that is on the order of 1% when I ran repeated tests.
That is not relevant though, and is why I will not even report the numbers that I get. RUN IT YOURSELF. Make sure the matrices are the same size that you will always have, because the BLAS will interact with cache sizes, with the number of CPUs that you have available. In fact, the exact size of an array will even be important for something as trivial as a matrix multiply.

Sayyed Ahmad on 21 Jun 2018
tic
for i=1:10000
end
toc
tic
for i=1:10000
end
toc
so you kann make a realistic Idea wich code is faster.
Xiaohan Du on 21 Jun 2018
this answer is useless, besides, why so many typos?