# Efficient use of matfile within parfor

2 views (last 30 days)
Angelo Cuzzola on 27 Apr 2020
Commented: Angelo Cuzzola on 5 May 2020
I have a very simple code that works pretty well when the size of the involved objects is manageable by the memory.
%%% Variables
% Z -> matrix of dimensions (r,T)
% V -> matrix of dimensions (r,r,T)
% C -> matrix of dimensions (n,r)
% nanY -> matrix of dimensions (n,T)
% y -> matrix of dimensions (n,T)
R_new_T = zeros(n,1);
parfor t=1:T
yC = diag(~nanY(:,t))*C;
R_new_T = R_new_T + (y(:,t)-yC*Z(:,t+1)).^2 ...
+ diag(yC*V(:,:,t+1)*yC') ...
+ Rvec.*nanY(:,t);
end
In practical application, that bunch of code has to run with the parameter n taking values around 80k leading quickly to an unfeasible memory load. The solution I figured out relies on matfiles, at the expenses of paralellizionation and general performance. This is the version with which I am able to handle it for large n.
yC = zeros(n,r);
save var_cyc.mat nanY y yC -v7.3;
v = matfile('var_cyc.mat', 'Writable', true);
R_new_T = zeros(n,1);
for t=1:T
v.yC = diag(~v.nanY(:,t))*C;
R_new_T = R_new_T + (v.y(:,t) - v.yC*Z(:,t+1)).^2 ...
+ diag(v.yC*V(:,:,t+1)*v.yC') ...
+ Rvec.*(v.nanY(:,t)) ;
end
Unfortunately this script cannot be paralellizable for classification problems related to the variable v. I'm wondering if there is a way to recover the initial parallelization structure using matlab file to handle huge file without incurring in 'out of memory errors'.
Thanks.

Edric Ellis on 28 Apr 2020
The problem is all to do with your assignment into v.yC. The parfor analysis cannot tell that this is safe and not order-dependent - it doesn't consider the fact that you're completely overwriting one field of the object. So, I think the simplest workaround is not to place yC in the matfile, and instead keep this as a loop temporary variable. In other words:
parfor t=1:T
yC = diag(~v.nanY(:,t))*C;
R_new_T = R_new_T + (v.y(:,t) - yC*Z(:,t+1)).^2 ...
+ diag(yC*V(:,:,t+1)*yC') ...
+ Rvec.*(v.nanY(:,t)) ;
end
This shouldn't cause any additional memory usage since each worker already needs a full copy of yC in memory in order to perform the other operations.

Angelo Cuzzola on 4 May 2020
To have a standalone code, you just need to simulate the initial objects.
Here it is:
% Initialize data at random
n = 60000;
r = 3;
T = 100;
Z = randn(r,T);
V = randn(r,r,T);
C = rand(n,r);
y = randn(n,T);
y(y==0) = NaN;
% Run the code
yC = zeros(n,r);
R_new_T = zeros(n,1);
save var_cyc.mat nanY y yC R_new_T -v7.3;
v = matfile('var_cyc.mat', 'Writable', true);
parfor t=1:T
v.yC = diag(~v.nanY(:,t))*C;
v.R_new_T = v.R_new_T + (v.y(:,t) - v.yC*Z(:,t+1)).^2 ...
+ diag(v.yC*V(:,:,t+1)*v.yC') ...
+ Rvec.*(v.nanY(:,t)) ;
end
Edric Ellis on 5 May 2020
This code is missing nanY and Rvec, so I added them, like this:
% Initialize data at random
n = 60000;
r = 3;
T = 100;
Z = randn(r,T);
V = randn(r,r,T);
C = rand(n,r);
y = randn(n,T);
y(y==0) = NaN;
nanY = isnan(y); % ??
Rvec = zeros(n,1); % ??
% Run the code
R_new_T = zeros(n,1);
save var_cyc.mat nanY y -v7.3;
clear y nanY
v = matfile('var_cyc.mat');
parfor t=1:(T-1)
nanYCol = v.nanY(:,t);
yC = nanYCol.*C;
R_new_T = R_new_T + (v.y(:,t) - yC*Z(:,t+1)).^2 ...
+ diag(yC*V(:,:,t+1)*yC') ...
+ Rvec.*nanYCol ;
end
This works, but it uses a lot of memory. The problem is essentially that each worker needs to make an n-by-n temporary. I was able to remove the first one of those by using yC = nanYCol .* C rather than forming the diagonal matrix. However, the second line of the computation of R_new_T also forms an n-by-n temporary only to extract the diagonal. It might be worth trying to refactor that so that it doesn't need the whole matrix.
Angelo Cuzzola on 5 May 2020
Thanks. It should work. Just for consistency yC definition need a correction
yC = (~nanYCol).*C;
The second line can refactored using only matrices of dimension (n,r)
sum((yC*V(:,:,t+1)).*yC,2)