Advice constructing large sparse Jacobian arrays

Question

Michael Brown el 1 de Oct. de 2023

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/2027949-advice-constructing-large-sparse-jacobian-arrays

Comentada: Sam Marshalik el 5 de Oct. de 2023

I am undertaking non-linear optimization of tensor b spline coefficients used to describe thermodynamic surfaces. Constructing the array of partial derivatives of the model predictions with respect to the model parameters (the Jacobian) is the most time consuming and memory limiting step. The arrays are large (>100,000 by >10,000) with the saving grace being that they are sparse (<2% non-zero). I have working code that strains a mac ultra (24 cores and 128 GB unified memory). With care in problem size choices things work as requested. However, I crash MATLAB and the macOS when I am insufficiently careful to not exceed memory limits.

As a practical matter, I watch MATLAB demand greater memory as it sets up the parallel loop and then release it while the loop executes. It is during the loop startup that MATLAB can run out of application memory.- the result is a crash giving: JavaScript Alert - https://127.0.0.1:31515 Message Service fatally disconnected. At this point, I have to force quit MATLAB or in some cases, restart the computer. I find that I can run a bigger problem if I submit it as a batch job rather than running the script in the command line.

At this point I am looking for advise on how to improve my code for greater speed with as little memory as possible required. Attached are code fragments that show what I have at this point with two versions - an ignorant parfor loop that lets MATLAB discover the non-zero array elements and an intellegent parfor loop that is aware of which data sites are sensitive to which model parameters. The second is slightly faster than the first but the speed enhancement is not as great as I expected.

There are also implicit questions in how MATLAB handles sparse arrays -

1. Do I save memory by type casting array indices as integers for the sparse array manipulations?

2. How small a floating point number is needed for MATLAB to label it zero for a sparse array?

3. I "invented" algorithms below to handle the parsing of array indices as needed for construction of the Jacobian. Perhaps there are cleaner quicker ways to accomplish this?

% Following are code fragments for non-linear optimization of tensor spline
% coefficients.  Two methods are shown for the construction of the
% Jacobian - the matrix of partial derivatives of the model at all data
% sites with respect to the spline parameters (the coefficents)
% Actual problems are associated with roughly 500,000 data sites and 15,000
% model parameters.  The code is running on a mac studio ultra with 24
% cores and 128 GB of ram. The number of workers has to be set to less than
% 24 in order to work within memory limits.  The typical time for running
% an optimization ranges from 10 to 30 minutes.  The majority of the time is
% consumed in the loop shown below.  Thus, the bottlenck in calculations is
% the time and memory requirements of this loop to determine the Jacobians
% The spline coefficients are in vector "coefs"
% The collocation arrays (the spline basis function values) are in structure "Cntrl"
% The function "model" calculates a number of properties that are returned in a structure as separate
% vectors: y1, y2, ....  
% 
% In the following we track one prediction of the model, 'velsq'. The actual code has to work with multiple predictions
% The value for the finite difference derivative increment, dG, must be set
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Example 1 => code that is oblivious to the pattern of sparseness of the
% collocation arrays:
% ndat is number of data sites
% nspl is number of spline coefficients (model parameters)
predct1=model(coefs,Cntrl); %model output using the base set of coefficients
nz=0.02*ndat*nspl; % experience shows that the Jacobian sparse array has less than 2% non-zero elements
% preallocate an array to hold the Jacobian for model predictions:
A=sparse([],[],[],ndat,nspl,nz);
% set up parallel pool constants:
C = parallel.pool.Constant(coefs);
D=parallel.pool.Constant(Cntrl);
E=parallel.pool.Constant(predct1);
%  Step through and perturb model parameter to determine finite difference
%  derivatives of data as a function of model parameters - the Jacobian
parfor (i=1:nspl,num_worker)
    coefs_tmp=C.Value;
    coefs_tmp(i)=coefs_tmp(i)+dG;
    predct2=model(coefs_tmp,D.Value);  
    A(:,i)=(predct2.velsq-E.Value.velsq)/dG;   % blind calculation of prediction differences 
    %for all data sites. Let MATLAB decide which differences produce zeros for the sparse array A
end
% on exiting this loop we have the Jacobian in sparse array "A".  This
% loop assumes no knowledge of the sparseness of the prediction differences
% and lets MATLAB make the call on what to place in the sparse array A
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Example 2 - Requires two steps - This is faster than the first example
% Below an effort is made to reduce numerical (hopefully memory as well) demands of the loop
% Step 1 
% determine indices for data impacted by each parameter (do this once in an optimization run and place in Cntrl structure)
[jg,ig]=find(Cntrl.G); % get indices of all non-zero elements of the collocation array
Indexes=[jg(:) ig(:)]; % put indices in a matrix to facilitate making a list
% The cell structure Aindx{i} contains indices for data sites impacted by parameter "i"
Aindx=cell(nspl,1); % alocate a cell variable to hold all the data site indices associated with each parameter
C = parallel.pool.Constant(Indexes);
% loop through all parameters and gather indices of the data impacted by
% each parameter - make int32 to reduce memory required
parfor i=1:nspl   
    tmp=int32(C.Value(C.Value(:,2)==i,1));
    Aindx{i}=tmp;
end
Cntrl.Aindx=Aindx; % put results in the Cntrl structure to make it available later.
% Step 2 
% Construct the Jacobian - New Jacobian calculated with every iteration of the optimization
predct1=model(coefs,Cntrl); %model output using the base set of coefficients evaluated at all data sites
sdat=cell(nspl,1); % preallocate a cell structure to hold non-zero Jacobian values
%  Step through and perturb model parameter to determine finite difference
%  derivatives of data as a function of model parameters - the Jacobian
C = parallel.pool.Constant(coefs);
D=parallel.pool.Constant(Cntrl);
E=parallel.pool.Constant(predct1);
parfor (i=1:nspl,num_worker)
    coefs_tmp=C.Value;
    parm_id=D.Value.Aindx{i};
    coefs_tmp(i)=coefs(i)+dG;
    predict2=model(coefs_tmp,D.Value,parm_id); % calculate model values only for data sites impacted by the ith parameter 
    tmp=(predict2.velsq-E.Value.velsq(parm_id))/dG; % calculate finite differences only for impacted data sites
    sdat{i}=tmp;   % the length of each sdat{i} cell is variable and about 2% or less of total number of data
end
% at this point we have all non-zero elements of the Jacobian in a cell
% structure "sdat" with nspl cells, each cell contains the Jacobian values for
% data impacted by that parameter at data sites given in Aindx.  These need to be unpacted and loaded
% into a sparse array
% Note that a cell structure rather than an array is used since the number
% of elements in each cell is variable
jtot=[];
itot=[];
stot=[];
parfor i=1:nspl
    jtot=[jtot;Cntrl.Aindx{i}];
    itot=[itot;i*ones(length(Cntrl.Aindx{i}),1)];
    stot=[stot;s{i}];
end
A=sparse(jtot,int32(itot),stot,ndat,nspl);
% End of Example
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The function that calculates model predictions:
function out=model(coefs,Cntrl,id)
% Function to return model predictions given b spline coefficients with data site
% basis functions as described in the structure "Cntrl".  Usage:
% out=model(coefs,Cntrl,id)
% where coefs are the spline parameters, Cntrl contains all the collocation
% information, id are indices of data sites influenced by a selected
% coefficient
if nargin==2
    id=1:length(Cntrl.G(:,1));
end
%calculate all the necessary spline determined predictions
G=Cntrl.G(id,:)*coefs;
Gp=Cntrl.Gp(id,:)*coefs;
Gpp = Cntrl.Gpp(id,:)*coefs;
G3p = Cntrl.G3p(id,:)*coefs;
G4p=Cntrl.G4p(id,:)*coefs;
G4t=Cntrl.G4t(id,:)*coefs;
Gtt =Cntrl.Gtt(id,:)*coefs;
Gpt=Cntrl.Gpt(id,:)*coefs;
G4p=Cntrl.G4p(id,:)*coefs;
Vex=Cntrl.Vex(id,:)*coefs;
% predictions are placed in output structure
out.G=G; % Gibbs energy
out.V=Gp; % specific volume is Gp
out.Gtt=Gtt; % specific heat is -Gtt*T
out.velsq=Gp.^2./(Gpt.^2./Gtt-Gpp);  % the combination of derivatives that gives the square of sound speeds
out.d2KdP2=G4p.*Gp.*Gpp.^(-2) + G3p./Gpp -2*G3p.^2.*Gp.*Gpp.^(-3); % the combination of derivatives giving the second derivative of the bulk modulus
out.G4t=G4t;% parameter to regularize - make "Gtt" smooth

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Michael Brown el 2 de Oct. de 2023

The mathematical statement of the optimization of parameters starts with the linear problem:

J*dp=devs

where J (the Jacobian) is the array of partial derivatives of the model with respect to parameters evaluated at data sites, dp is a vector of changes to the parameters and vector devs is the differences between model predictions and data. One inverts this to obtain revised (better) model parameters:

dp= inv(J)*devs The most straightforward matlab methods are: J\devs or least-square: (J'*J)\(J'*devs)

The idea is that solutions of a linear problem can iteratively lead to improvements of model predictions that are not linear with respect to model parameters. This is standard and underlies a whole body of "inverse theory" algorithms that is not the focus here. Many algorithms start with the need to construct the Jacobian which is, in my application, the most numerically intensive step.

Although the application is not important here, I am representing Gibbs energy with tensor b splines. Appropriate derivatives of G give the surfaces for individual phases as shown in the image above. With good representations, we predict those surfaces and their intersections. I published pioneering work here and further examples here

Michael Brown el 2 de Oct. de 2023

Also - all my publications come with matlab live scripts that demo the released versions of my codes which are also in GitHub repository linked to SeaFreeze.org -

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Walter Roberson el 1 de Oct. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/2027949-advice-constructing-large-sparse-jacobian-arrays#answer_1323194

Editada: Walter Roberson el 1 de Oct. de 2023

Abrir en MATLAB Online

1. Do I save memory by type casting array indices as integers for the sparse array manipulations?

No.

A1 = sparse(zeros(20,20));
A2 = sparse(zeros(20,20)); A2(20,20) = 1;
A3 = sparse(zeros(20,20)); A3(uint32(20),uint32(20)) = 2;
A4 = sparse(uint32(20), uint32(20), 3);
whos A1 A2 A3 A4
  Name       Size            Bytes  Class     Attributes

  A1        20x20              184  double    sparse    
  A2        20x20              184  double    sparse    
  A3        20x20              184  double    sparse    
  A4        20x20              184  double    sparse    

Observe that they are all exactly the same size (though one might suspect that the completely empty one might be smaller because it has no non-zero entries at all.)

2. How small a floating point number is needed for MATLAB to label it zero for a sparse array?

format long g
idx = -350:-300;
A = (10.^idx).';
S = sparse(A);
[firstnz, ~, val_there] = find(S, 1);
idx(firstnz)
ans = 
  -323
val_there
val_there = 
     9.88131291682493e-324
B = sparse([eps(0), 0, -0])
B = 
   (1,1)        4.94065645841247e-324
B
B = 
   (1,1)        4.94065645841247e-324

So the boundary is at the smallest representable number. sparse() is testing for exact zeros, treating negative 0 the same as 0.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 2

Matt J el 2 de Oct. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/2027949-advice-constructing-large-sparse-jacobian-arrays#answer_1323209

I am undertaking non-linear optimization of tensor b spline coefficients used to describe thermodynamic surfaces.

I suggest looking at Example2D.m in,

https://www.mathworks.com/matlabcentral/fileexchange/26292-regular-control-point-interpolation-matrix-with-boundary-conditions?s_tid=srchtitle

which demonstrates efficient ways of doing 2D tensorial spline coefficient estimation.

16 comentarios
Mostrar 14 comentarios más antiguosOcultar 14 comentarios más antiguos

Matt J el 2 de Oct. de 2023

Editada: Matt J el 2 de Oct. de 2023

Abrir en MATLAB Online

I still don't have a firm picture of the mathematics of the model. I'm assuming that you have a continuous function which, if it were simplified to 2D, would be parameterized something like the following, where x_ij are unknown spline coefficients and s(t) is a 1D spline basis function.

If we sample F(x,y) at discrete, integer locations x=m, y=n this gives,

which in matrix vector form is,

where S(m,i)=s(m-i) is a sparse matrix. The solution for X in MATLAB code would be,

X=S\F/S.'

The thing to notice is that even though F has O(N^2) elements, the sparse matrix S has only O(N) non-zero elements. Similarly, if you did this in 6D, F would be O(N^6) but the inversion would still involve only O(N) matrices.

In the case where you are taking partial derivatives of the splines, the above would modify to things like,

which in vector matrix form would now involve 2 different matrices,

but both S and S_y are still going to be O(N) sparse, so the same efficiencies apply.

Michael Brown el 4 de Oct. de 2023

You are stating the linear problem as I implement it - although I usually solve the least square version X=(B'B)\(B'F) where B is the matrix of basis functions and F is vector of data (flattend from the 3D array of x,y,z data points) and X are the unknown spline coefficients (flattend from the 3D array). And yes, the O(N) scaling of basis functions is the saving feature of the tensor problem. I premultiply the sparse basis function matrixes for each dimension which generates a sparse basis function matrix B that has dimensions of the number data by the product of the number of spline coefficients in each dimension. Partial derivatives of the representation each have their own basis function matrix. I predict all the needed partial derivatives Fy=By*X where Fy is a specified partial derivative of the representation at all data locations, By is the sparse matrix of basis functions associated with that partial derivative, and X is flattened from a 3D array to a vector (as are the data). The partial derivative predictions, in non-linear combinations, lead to predictions of thermodynamic properties. I do not see a mathematical path that allows the basis function matrixes for the partials to be non-linearly combined to allow a linear (in spline coefficients) solution for the thermodynamic properties.

Matt J el 4 de Oct. de 2023

Editada: Matt J el 4 de Oct. de 2023

Abrir en MATLAB Online

I premultiply the sparse basis function matrixes for each dimension which generates a sparse basis function matrix B that has dimensions of the number data by the product of the number of spline coefficients in each dimension.

You should not do this. Since the basis matrix for each dimension is O(N), combining them together through multiplication will give a total matrix that has O(N^d) elements where d is the number of dimensions. Conversly, the computational approach I showed you keeps the basis matrices for each dimension separate, for a total memory footprint that is O(d*N).

I do not see a mathematical path that allows the basis function matrixes for the partials to be non-linearly combined to allow a linear (in spline coefficients) solution for the thermodynamic properties.

The equation I showed before,

is equivalent to the Kronecker product expression,

Fy(:)=kron(Sy,S)*X(:)

Certain nonlinear operations, in particular the ones you seem to be using, are distributive across Kronecker products as long as the matrix dimensions align e.g.,

kron(A,B).^2 = kron(A.^2,B.^2)
kron(A,B)./ kron(C,D) = kron(A./C, B./D)
kron(A,B) / kron(C,D) = kron(A/C, B/D)

Therefore, the nonlinear operations will give a new separable basis matrix and my technique of working with separate matrices in each dimension should again be applicable. As a side note, though, I do not quite understand why you would want to element-wise divide sparse matrices, like you are doing in this line,

out.velsq=Gp.^2./(Gpt.^2./Gtt-Gpp);

since most of the resulting elements will be 0/0=NaN.

although I usually solve the least square version X=(B'B)\(B'F)

I don't see why you would do that, since this is equivalent to X=B\F and B'*B has a poorer condition number than B alone. However, since B-spline matrices are well-conditioned, maybe the impact is minimal here...

Michael Brown el 5 de Oct. de 2023

Abrir en MATLAB Online

The partials are inherent in the B and C matrixes of basis functions. If B is for the x corrdinate and C for the y coordinate, and a subscript means the derivative set of basis functions, then I calculate Dy=kron(Cy,B) and Gy=Dy*X (X being the coefficients and Gy(x,y) being the partial of G with y)

I do calculate D in both 2 and 3 dimensions. It is large but not impossible. I am not an applied mathematician. I looked at the form of the tensor multiplications, didn't know about "kron" and created my own function for 2 and 3 D (below). I've just compared my calculations with those using kron. Exactly the same.

The calculations of the inverse problem using D take a few seconds - not really a show stopper.

You have the 2D problem:

z=B*a*C.'

What is the 3D version of this?

function D=makeCmn(A,B,C)
% This function combines collocation matrixes A(i,j) B(k,l) C(p,q) for 
% three dimensions (associated with a 4D hypersurface tensor spline)
% into a single matrix of size i*k*p by j*l*q. The idea is that the 
% tensor spline calculation is given as:
%    sum sum sum (ABC*a)  where A,B, and C are collocation matrix  and a is
%    an array of coefficients. 
% This can be remapped into a single matrix multiply D*a(:)
%    where D is the matrix returned here.
%  If working in 3D pass only matrixes A and B to this function
%  If A and B are row vectors, makeCmn returns a
%  single row corresponding to a single "data" point.
% JMB 2011
% Below I take advantage of sparseness - 
[ai,aj]=size(A);
[bk,bl]=size(B);
%Determine whether there is another dimension to the problem
switch nargin
    case 3
        [cp,cq]=size(C);
    case 2
        C=1;
        cp=1;cq=1;
end
%Determine (1) which elements are non-zero (A,B,C are sparse) and (2) the sizes of the matrixes
nR= ai*bk*cp;  
nC= aj*bl*cq;
[ii,jj,sa]=find(A);
[kk,ll,sb]=find(B);
[pp,qq,sc]=find(C);
ni=length(ii);
nk=length(kk);
np=length(pp);
nmm=ni*nk*np;
% preallocate vectors
mm=zeros(nmm,1);
nn=zeros(nmm,1);
v=zeros(nmm,1);
%Do the work
for p=1:np
    for k=1:nk
        im= (kk(k)-1)*ai + bk*ai*(pp(p)-1);
        in= (ll(k)-1)*aj + bl*aj*(qq(p)-1);
        count=(k-1)*ni +ni*nk*(p-1);
        for i=1:ni 
            mm(count+i)= ii(i) + im;  
            nn(count+i)= jj(i) + in;
            v(count+i)=sa(i)*sb(k)*sc(p);
        end
    end
end
%Assemble the resultant sparse matrix
D=sparse(mm,nn,v,nR,nC);
end  

Matt J el 5 de Oct. de 2023

Editada: Matt J el 5 de Oct. de 2023

It appears that my makeCmn(A,B,C) is equivalent to your KronProd(A,B,C)

But it should be much faster and more memory-efficient, as demonstrated in my example.

We are back to trying to use the available associative properties of the tensors to formulate a non-linear combination of partial derivative basis functions as a linear problem.

I don't see how we are "back" there. The partial derivatives are small arrays and we've established that they can be efficiently computed using the same kind of Kronecker product manipulations that we've discussed at length.

Once you've computed the partial derivatives, you can manipulate them however you wish without any obvious computational challenges. However, if there's something I'm missing, you can elaborate on it in a new post, since your question in this one has been answered.

Sam Marshalik el 5 de Oct. de 2023

@Michael Brown: I am not sure if you have attempted to use Threads instead of Processes when running parpool (https://www.mathworks.com/help/parallel-computing/choose-between-thread-based-and-process-based-environments.html). If not, give it a shot, as threaded workers will be able to share memory space and will not need to duplicate variables (unless changes are made to them on the worker).

Iniciar sesión para comentar.

Advice constructing large sparse Jacobian arrays

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (1)

16 comentarios
Mostrar 14 comentarios más antiguosOcultar 14 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Advice constructing large sparse Jacobian arrays

4 comentarios Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (1)

16 comentarios Mostrar 14 comentarios más antiguosOcultar 14 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

16 comentarios
Mostrar 14 comentarios más antiguosOcultar 14 comentarios más antiguos