Adding values in vector according to labels in another vector

I have two arrays of same size E x 1, let's call them Values and Labels. Values have real-valued data points, while Labels have integer-valued group labels for each data point, ranging from 1 to N. In my application E could be on the order of 1e6, while N is on the order of 1e5, so there are on average ~10 data points sharing the same group label.
I would like to generate the accumulated sum of values sharing the same group label. That is, I want to generate an N x 1 vector GroupSum where GroupSum(i) = sum(Values(Labels == i)). This will be done for many instances of (Values, Labels) combinations in an outer loop, so it is important to do this quickly. Therefore, I would like to find a faster alternative to
for i = 1:N
GroupSum(i) = sum(Values(Labels == i));
end
Any suggestions would be appreciated.
Thanks in advance.
Murat

 Respuesta aceptada

Hi,
From the current implementation I assume that the labels run from 1 to N without missing any value in between. The best-case scenario I thought of is to at least traverse the Values array once and have an accumulating count for each label. This way you can save the overhead of doing logical indexing for all the Labels. Below code may help.
E = 1e6;
N = 1e5;
Labels = randi([1 N],E,1);
Values = randi([1 100],E,1);
GroupSum = zeros(1,N);
for i = 1:E
GroupSum(Labels(i)) = GroupSum(Labels(i)) + Values(i);
end
This doesn’t use (Labels == i) which can be time saving.

5 comentarios

Hi Raunak,
Thank you very much, your help is much appreciated.
Your solution reduces the runtime by 4 orders of magnitude from 100 sec to ~0.01 sec, as tested below. This is despite running a for loop over E which is 10 times larger than N. It is amazing that logical indexing is so slow.
I forgot to mention in my post that the vector Labels is static in my problem, i.e. it does not change from one run to the next, while the vector Values changes. Therefore, it is possible to pre-process Labels to eliminate the for loops altogether. See the bottom of the code below. However, this does not improve the speed over your proposed solution, in fact it is a factor 2-4x slower. This is somewhat surprising to me.
Thanks again.
Murat
N = 1e5;
E = 1e6;
Values = randn(E, 1);
Labels = randi(N, E, 1);
GroupSum = zeros(N, 1);
GroupSum2 = zeros(N, 1);
tic;
for i = 1:N
GroupSum(i) = sum(Values(Labels == i));
end
toc;
tic;
for i = 1:E
GroupSum2(Labels(i)) = GroupSum2(Labels(i)) + Values(i);
end
toc;
%preprocessing of Labels to elininate for loops
d = max(groupcounts(Labels));
P = zeros(d, N);
Values(E+1) = 0;
for j = 1:N
list = find(Labels == j);
L = length(list);
P(1:L, j) = list;
if L < d
P((L+1):d, j) = E+1;
end
end
%end of preprocessing
tic;
GroupSum3 = sum(Values(P))';
toc;
Raunak Gupta
Raunak Gupta el 13 de Nov. de 2020
Editada: Raunak Gupta el 13 de Nov. de 2020
Hi Murat,
My suggestion was based on the sole fact that the traverse through the vector should be the least. The thing with logical indexing is the same, the traverse is to the entire vector that is why here I have traverse of 1e6 whereas in the original method it was 1e5*1e5. That is why you see 4 orders of improvement.
As for the storing of indexes is concerned it can be an expensive approach due to storing the value in a rectangular matrix. Lets say if one Label is little bit skewed and instead of 10-20 times it is there for 100 times then the matrix storage will become huge. Also the same problem of traverse is there.
The method I suggest is optimal for this approach because I know all the Labels from 1 to N will be present. Also the traverse remains the same 1e6 irrespective of the skewness of distribution of labels.
One note I would suggest is this kind of indexing is not cache friendly because I am picking the indexes from GroupSum based on current Label index. So if low level algorithms are to be developed it will still follow a more iterative approach of indexing from 1 to M.
Hope these insights can give an idea about why the method I suggested is working good.
Hi Raunak,
Thanks for the clear explanation of why the 4 orders of magnitude improvement is occurring. I'd like to understand it better so that I can improve the execution time for my application: a decoder simulation that involves running similar (but more complex, i.e. more than a simple summation) computation sequentially by about 1e7 times.
The vector Labels is known a priori due to an external design, i.e. it is not random. Therefore, the dimensions of the rectangular matrix P will not change and its row dimension d is guaranteed to be small, in the range 10-15. (By the way, I did not have this idea when I posted the original question, or else I would have posed it differently.)
What I don't understand is why the computation of GroupSum3 takes about 40-45 msec in my machine, while your solution of computing GroupSum2 in the for loop takes 10-20 msec. The P matrix in this experiment is 25 x 1e5, so the line GroupSum3 = sum(Values(P))' should be faster than the for loop over 1e6 values one by one. I suspect the memory access in forming the 25 x 1e5 matrix Values(P) is time-consuming, so that your for loop sequentially adding Values(i) to GroupSum(Label(i)) 1e6 times wins.
There is additional structure in Labels in my problem so that I can break down the sum to a small number of sums via e.g. 3 matrices P1, P2, P3 of different row dimensions, and gain an additional ~3x speedup, but that only barely catches up with your solution, which, as you say, would work for any Labels.
Can the for loop be improved if Labels were know to have such structure? E.g., say a given label can occur only 5, 10 or 15 times, i.e. unique(groupcounts(Labels)) is a very small array of only 3 elements, unlike a case with randomly generated Labels.
Thanks again.
Murat
Hi Murat,
As I suggested the time complexity here is suggested by the how many indexes from Value matrix you need to traverse. Since the sum of the values is required so the whole Value vector is needed to be traversed. So, even if we have a property which can group the number of labels based on there frequencies still in the end the Value needs to be traverse in full.
Thus as long as Sum is the final result, I don't see any further enhancement in terms of time complexity because the solution I suggested is having linear time complexity.
I see your point. No matter how the bins are formed via Labels, at the end there are E addition operations to be performed. Thus the complexity cannot be below O(E) which is what your for loop achieves.
This also explains why the matrix based approach is about 2-4x slower, there are 2.5e6 (=d N) elements in my P matrix many of them 0's. I had thought avoiding the for loops would be a big help in Matlab, but apparently not in this case.
Thank you very much for all the help.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Productos

Versión

R2020a

Preguntada:

el 6 de Nov. de 2020

Comentada:

el 14 de Nov. de 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by