# How can I speed up multiplication of multiple 3D matrices?

39 views (last 30 days)
Akash ganesh on 4 Jan 2021
Commented: Akash ganesh on 13 Jan 2021
Hello everyone,
I am performing arithmetic operations with large 3D matrices (10^5 x 2 x 4) inside a for loop with large number of iterations viz. it is of order ~ 5*10^5. It is taking close to 5 hours for executing my code. Upon profiling my code, it indicated that the lines evaluating the matrices px, py and pz inside my for loop (the equations which are immediately after the comment "For the three decoupled cases") are the ones that are taking too long to execute i.e. almost 5-10 times longer than other lines inside for loop. I was wondering if any of you could provide me some suggestions on how I can speed up these lines and reduce the overall execution time of my code. Please find attached my code below.
Thanks.
clear variables
rng('default');%Make the number generation repeatable
%---------------------------------------------------------------%
%Declaration of input parameters%
V_S = 20; %swimming velocity of bacteria in um/s%
Dr = 1;%Rotational diffusivity of bacteria in rad^2/s%
L = (3*10^4)/2; %Width in um%
H = 520/2; %Half the channel height in um (Channel height is 2H)%
qs = 1;%Aspect ratio of sphere%
qi = 2;%Aspect ratio for an intermediate beta%
qe = 10; %Aspect ratio of ellipsoidal rod%
Pes = V_S/(2*Dr*H); %Swimming Peclet%
%Time period of rotation of rod in a flow varies according to gamma_dot_M
%value%
if T > 1/Dr
T = 1/Dr; %Time period of rotation of a rod in a flow%
end
tau = T/50; %time step (we make use of the lowest tau i.e. the one for sphere)%
tottime = 5000;%Total time needed in secs%
ntime = ceil(tottime/tau);%Total number of time steps required to evaluate upto 5000 secs%
np = 10^5; %Number of particles%
if rem(ntime,10) == 0
st = ceil(ntime/10) + 1; %Total number of data to be saved%
else
st = ceil(ntime/10);
end
%---------------------------------------------------------%
%Calculating some constant values to be used inside for loop%
del = sqrt(2*Dr*tau);
delsqr = 2*Dr*tau;
Vstau = V_S*tau;
Beta = ones(np,1,3);
Beta(:,1,1) = ((qs^2-1)/(1+qs^2)).*Beta(:,1,1);%Jeffrey constant for sphere%
Beta(:,1,2) = ((qi^2-1)/(1+qi^2)).*Beta(:,1,2);%Jeffrey constant for q=2 particle%
Beta(:,1,3) = ((qe^2-1)/(1+qe^2)).*Beta(:,1,3);%Jeffrey constant for ellipsoidal rod%
%-----------------------------------------------------%
%Setting variables to be used inside the loop%
%Array for stroing positions and orientations for all particles%
x = zeros(np,2,4);
y = zeros(np,2,4);
z = zeros(np,2,4);
px = zeros(np,2,4);
py = zeros(np,2,4);
pz = zeros(np,2,4);
%Array for storing first moments and variances%
tim = zeros(st,1,4);
m1x = zeros(st,1,4);
m1y = zeros(st,1,4);
m1z = zeros(st,1,4);
varx = zeros(st,1,4);
vary = zeros(st,1,4);
varz = zeros(st,1,4);
%------------------------------------------------------------------------%
%Generation of random initial positions and angles of particles generated from Uniform distribution%
%For sphere%
x(:,1,1) = (L - (-L)).*rand(np,1) - L; %scaled to generate random positions between -L and L%
y(:,1,1) = zeros(np,1);%All particles starts at y = 0%
z(:,1,1) = (H - (-H)).*rand(np,1) - H; %scaled to generate random positions between -H and H%
theta(:,1) = (2*pi - 0).*rand(np,1) ; %scaled to generate random orientations between 2pi and 0%
phi(:,1) = (2*pi - 0).*rand(np,1); %scaled to generate random orientations between 2pi and 0%
%Assiging the same initial conditions for q = 2 particle%
x(:,1,2) = x(:,1,1);
y(:,1,2) = y(:,1,1);
z(:,1,2) = z(:,1,1);
%Assigning the same initial conditions to ellipsoidal rod%
x(:,1,3) = x(:,1,1);
y(:,1,3) = y(:,1,1);
z(:,1,3) = z(:,1,1);
%Assigning same for decoupled particle%
x(:,1,4) = x(:,1,1);
y(:,1,4) = y(:,1,1);
z(:,1,4) = z(:,1,1);
%Saving initial positions since all initial positions are same, saving one set is enough%
xint(:,1) = x(:,1,1);
yint(:,1) = y(:,1,1);
zint(:,1) = z(:,1,1);
%Initial orientation%
%For sphere%
px(:,1,1) = sin(theta(:,1)).*cos(phi(:,1));
py(:,1,1) = sin(theta(:,1)).*sin(phi(:,1));
pz(:,1,1) = cos(theta(:,1));
%Assiging the same for q = 2 particle%
px(:,1,2) = px(:,1,1);
py(:,1,2) = py(:,1,1);
pz(:,1,2) = pz(:,1,1);
%Assigning the same for ellipsoidal rod%
px(:,1,3) = px(:,1,1);
py(:,1,3) = py(:,1,1);
pz(:,1,3) = pz(:,1,1);
%Assigning the same for decoupled particle%
px(:,1,4) = px(:,1,1);
py(:,1,4) = py(:,1,1);
pz(:,1,4) = pz(:,1,1);
N = 0;
%loop for every time step%
for t = 2:ntime
%Added noise in the orientation direction (same noise used for sphere ellipsoidal rod and decoupled particle)%
dpx = del.*randn(np,1);
dpy = del.*randn(np,1);
dpz = del.*randn(np,1);
%Calculate the new position in x,y,z directions%
x(:,2,:) = x(:,1,:) + Vstau.*px(:,1,:);
y(:,2,:) = y(:,1,:) + Vstau.*py(:,1,:) + gammaloc.*(1 - (z(:,1,:)./H).^2);
z(:,2,:) = z(:,1,:) + Vstau.*pz(:,1,:);
%Calculate the new orientation and normalize%
%For the three coupled cases%
px(:,2,1:3) = px(:,1,1:3) - gammatau.*z(:,1,1:3).*Beta(:,1,:).*px(:,1,1:3).*py(:,1,1:3).*pz(:,1,1:3) - delsqr.*px(:,1,1:3) - py(:,1,1:3).*dpz + pz(:,1,1:3).*dpy;
py(:,2,1:3) = py(:,1,1:3) + gammatau2.*z(:,1,1:3).*pz(:,1,1:3).*(Beta(:,1,:).*(1- 2.*py(:,1,1:3).*py(:,1,1:3)) + 1) - delsqr.*py(:,1,1:3) - pz(:,1,1:3).*dpx + px(:,1,1:3).*dpz;
pz(:,2,1:3) = pz(:,1,1:3) - gammatau2.*z(:,1,1:3).*py(:,1,1:3).*(Beta(:,1,:).*(2.*pz(:,1,1:3).*pz(:,1,1:3) - 1) + 1) - delsqr.*pz(:,1,1:3) - px(:,1,1:3).*dpy + py(:,1,1:3).*dpx;
%For decoupled%
px(:,2,4) = px(:,1,4) - delsqr.*px(:,1,4) - py(:,1,4).*dpz + pz(:,1,4).*dpy;
py(:,2,4) = py(:,1,4) - delsqr.*py(:,1,4) - pz(:,1,4).*dpx + px(:,1,4).*dpz;
pz(:,2,4) = pz(:,1,4) - delsqr.*pz(:,1,4) - px(:,1,4).*dpy + py(:,1,4).*dpx;
%Scaling the orientation with its magnitude%
normpsqt = sqrt(px(:,2,:).*px(:,2,:) + py(:,2,:).*py(:,2,:) + pz(:,2,:).*pz(:,2,:));
px(:,2,:) = px(:,2,:)./normpsqt(:,1,:);
py(:,2,:) = py(:,2,:)./normpsqt(:,1,:);
pz(:,2,:) = pz(:,2,:)./normpsqt(:,1,:);
%The following sets the boundary condition of the particle%
%Incase the calculated z-position of the particle exceeds H or drops below
%-H, it stays at the boundary i.e. H or -H until it tumbles and
%reorients itself back into the channel same for x-position as well%
A = z(:,2,:);
B = x(:,2,:);
A(A>H) = H;
A(A<-H) = -H;
B(B>L) = L;
B(B<-L) = -L;
z(:,2,:) = A(:,1,:);
x(:,2,:) = B(:,1,:);
%Average over all particles at different times%
if rem(t,10)==0 || t==2
N = N+1;
tim(N,1) = (t-1)*tau; %record time elapsed%
%First moment and variance%
m1x(N,1,:) = sum(x(:,2,:) - xint(:,1))./np; %First moment along x-direction%%
m1y(N,1,:) = sum(y(:,2,:) - yint(:,1))./np; %First moment along y-direction%%
m1z(N,1,:) = sum(z(:,2,:) - zint(:,1))./np; %First moment along x-direction%%
varx(N,1,:) = sum((x(:,2,:) - xint(:,1) - m1x(N,1,:)).^2)./np; %Variance along x-direction%
vary(N,1,:) = sum((y(:,2,:) - yint(:,1) - m1y(N,1,:)).^2)./np; %Variance along y-direction%
varz(N,1,:) = sum((z(:,2,:) - zint(:,1) - m1z(N,1,:)).^2)./np; %Variance along z-direction%
end
%Reassinging the new position and orientation values to the old index%
x(:,1,:) = x(:,2,:);
y(:,1,:) = y(:,2,:);
z(:,1,:) = z(:,2,:);
px(:,1,:) = px(:,2,:);
py(:,1,:) = py(:,2,:);
pz(:,1,:) = pz(:,2,:);
end %end of time step loop%

SaiDileep Kola on 13 Jan 2021
Hi Akash,
I don't see any complex matrices in the implementation you can make use of NDFUN it is explained in the link provided and for improving further performance you can also refer to the procedure explained in a similar question answered here
Akash ganesh on 13 Jan 2021
Hello SaiDileep Kola,
Thank you so much for your response. Will look into it.
Thanks.