78 views (last 30 days)

Show older comments

I tried to make the variables single to decrease the calculation time.

It's a gravitational interaction calculation between particles.

How can I vectorise this code to avoid so many loops?

Can I do it since the particles could be more than 1 million and there should be built matrixes 1million x1million cells?

Most time spent on these lines

d = [xj yj zj] - [xi yi zi]; % 3D distance as a vector

dr2 = d(1)^2+d(2)^2+d(3)^2; %3D distance as absolute value

AccelDouble = (d/sqrt(dr2))*((G*mj*(sign(mi)))/dr2); % acceleration increment

ar = ar + single(AccelDouble); % New acceleration

n = size(heavy,1);

AccelDouble = zeros(3,'double') ;

for i = 1:n

mi = heavy(i,1); % mass of first particle

if mi ~= 0 % Avoid calculate 0 mass objects

xi = heavy(i,2); % x position of first particle

yi = heavy(i,3); % y position of first particle

zi = heavy(i,4); % z position of first particle

ar = [0 0 0]; %3D

for j = 1:n % second count for reacting particles

if i ~= j % to avoid counting the same particles

mj = heavy(j,1); % mass of second particle (interacting)

if mj ~= 0 && mi ~= 0 % Avoid calculate 0 mass objects

xj = heavy(j,2); % x position of second particle

yj = heavy(j,3);

zj = heavy(j,4);

d = [xj yj zj] - [xi yi zi]; % 3D distance as a vector

dr2 = d(1)^2+d(2)^2+d(3)^2; %3D distance as absolute value

AccelDouble = (d/sqrt(dr2))*((G*mj*(sign(mi)))/dr2); % 3D acceleration increments

ar = ar + single(AccelDouble); % New 3D acceleration

end

end

end

for k = 1:3

heavy1(i,4+k) = heavy1(i,4+k) + ar(k)*timeStep; % 3D New Speed calculation

heavy1(i,k+1) = heavy1(i,k+1) + timeStep*heavy1(i,k+4); % 3D New posistion calculation

end

end

end

Walter Roberson
on 26 Feb 2021

Can I do it since the particles could be more than 1 million and there should be built matrixes 1million x1million cells?

An array that was 1e6 x 1e6 would take 1e12 times the space for an individual element. For single precision that would be 4e12 bytes, which is slightly more than 3.6 terabytes, using 2^40 bytes as the definition of terabytes here.

... for each array.

Does your system have access to multiple terabytes of memory?

but this sounds like something a GPU should be able to do well.

The largest commercially available GPU is the NVIDIA Titan V, with 12 GB of memory. Considering the need for multiple matrices, you probably would not be able to handle much more than 25000 x 25000 single precision on such a system.

Bjorn Gustavsson
on 19 Feb 2021

Newton says you can cut it in two if you take into account that the force from particle i on particle j is equal but directed in opposite direction of force from particle j on particle i. Also if proper pre-allocation is tricky just run the loops from n down to 1.

Finally if this is some kind of explicit Euler scheme I suggest you take a look at Størmer-Verlet schemes instead (or perhaps the 12th-10th-order-runge-kutta-nystrom-integrator that might also be an option) that ought to be more suited for equations of motion.

Jan
on 19 Feb 2021

Edited: Jan
on 19 Feb 2021

ar = [0 0 0];

This creates a double array.

ar = ar + single(AccelDouble);

Here AccelDouble is converted to a single at first, which must be converted back to a double, because it is added to a double. This is a waste of time.

Either convert all values to singles from the beginning. Or stay at doubles.

mi ~= 0 is excluded twice. If would be more efficient, to remove all m==0 before the loops.

"sign(mi)"? Do you mean "single(mi)"?

AccelDouble = zeros(3,'double') ;

This produces a [3 x 3] array. Because it is overwritten ineach iteration, this is not a pre-allocation, but a waste of time. It does not matter, that it is overwritten by a [1 x 3] array, but confusing only.

I assume, your code can be accelerated. What are the used sizes of the input?

The integration scheme based of delta(v) = delta(t)*a is very basic. There are integrations methods on a higher order, which allow to increase the step sizes, which reduces the run time.

[EDITED] Vectorizing the inner loop and using SINGLE for all data reduces the runtim from 1.64 sec to 0.06 sec for 2e3 objetcs:

function test

% Mass, position, speed

heavy = rand(2e3, 7) .* [1e9, 1e6, 1e6, 1e6, 1e3, 1e3, 1e3];

tic

heavy = Core1(heavy);

toc

heavyS = single(heavy)

tic

heavy = Core1(heavyS);

toc

end

function heavy = Core3(heavy)

timeStep = 0.1;

G = 6.67430e-11;

M = heavy(:, 1);

X = heavy(:, 2:4);

V = heavy(:, 5:7);

n = numel(M);

MG = M * G;

for i = 1:n

mi = M(i);

D = X - X(i, :);

r2 = sum(D .* D, 2);

F = (D ./ sqrt(r2)) .* (MG * mi ./ r2);

F(i, :) = 0;

Ft = sum(F, 1);

V(i, :) = V(i, :) + Ft * timeStep;

X(i, :) = X(i, :) + timeStep * V(i, :);

end

heavy(:, 2:4) = X;

heavy(:, 5:7) = V;

end

% Elapsed time is 1.880953 seconds. Original code

% Elapsed time is 0.140433 seconds. Vectorized, double

% Elapsed time is 0.066267 seconds. Vectorized, single

I've implemented this with the assumption that your "sign(mi)" was thought as "single(mi)".

But note this: In the current implementation the problem is O(n^2). Doubling the input size does increase the runtime by a factor 4. I've tested the code with 2e3 rows. For 1e6 rows this will be more than 4 hours per time step.

How many timesteps do you want to calculate? I assume, there is no way to perform this efficiently in Matlab with this simple apporach. Even if you convert this to a C-mex method and increase the speed by a factor 2 (a pure guess only!), 2 hours per time step is still far away from allowing a serious number of time steps.

Bjorn has mentioned, that the force matrix is symmetrical, but it is not trivial to exploit this for 1e6 elements without exhausting the the memory. Using a sophisticated intergator is essential to reduce the number of timesteps for a certain accuracy.

Professional simulations use e.g. the leap-frog algorithm. This simplifies the computation by combining a neighborhood of elements to their center of mass: If we simulate the trajectories of the stars in the galaxy, the distribution of other stars in a spatial segment far away is in first approximation the force of its center of mass containing the sum of the masses. This can reduce the computing time by a factor 100 without loosing too much accruracy.

Bjorn Gustavsson
on 26 Feb 2021

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!