Make curve fitting faster
Mostrar comentarios más antiguos
Hi,
I would like to make my curve fitting faster by using GPU. Is it possible? If yes how?
Thanks
1 comentario
Matt J
el 25 de Mzo. de 2019
Marek's problem description copied here for reference:
I have MRI data of brain (each subject has approx. 0,5 mil voxels) with 10 different parameters and I want to in each voxel fit this 10 points by specific biexponencial curve (so I receive parametric maps of brain). I tried to optimize the script before I used parallel computing (1 brain ~4 hours) but I used only parfor (~1,25 hours) and because I am still beginner with Matlab I was curious if other users have experience with GPU and can help me make this faster than parfor.
Thanks
Marek
Respuesta aceptada
Más respuestas (1)
Joss Knight
el 24 de Nov. de 2018
Editada: Joss Knight
el 24 de Nov. de 2018
0 votos
It does rather depend on what you're doing. The functions polyfit and interp1 work with gpuArray inputs.
14 comentarios
Joss,
Similar to the OP, I'd be curious to know if there are plans for gpuArray support in the Optimization Toolbox solvers (both the curve fitters and others). It seems like it should be easy to do. I'm pretty sure all the numeric operations used internally by PCT algorithms are already defined for gpuArray.
Joss Knight
el 24 de Nov. de 2018
I think you're right, although I'm not sure that curve fitting is commonly a performance bottleneck is it? Can you give an example of a typical workflow you hope to benefit from GPU support?
People are always looking for more speed out of the Optimization Toolbox. This list of Answers posts and this present post are ready indications of that.
I think TMW already recogizes this. If curve fitting couldn't benefit significantly from parallel computing then why would lsqcurvefit already offer a UseParallel option? Why do all of the Optimization Toolbox solvers offer this? There's no reason to offer CPU acceleration, but not GPU acceleration, that I can see.
Joss Knight
el 24 de Nov. de 2018
Okay, I'm getting confused now, we seem to be talking about Optimization not just Curve Fitting. I can see that in the case of lsqcurvefit it's the same thing.
Not everything that parallelizes well on multicore parallelizes on the GPU. The objective functions for optimization are often completely general (or user-specified), and typically non-scalar. Optimization often requires repeated independent evaluations of the objective per iteration, and so this can be done on a parallel pool. But usually we're talking dozens to hundreds of evaluations, not the millions of parallel operations that a GPU needs in order to give an advantage. And unlike a worker in a parallel pool, a GPU thread doesn't have its own MATLAB interpreter which it can use to execute any given objective.
Anyway, this is a simplification - obviously where there's need and opportunity we'll try to do more.
Not everything that parallelizes well on multicore parallelizes on the GPU. The objective functions for optimization are often completely general (or user-specified), and typically non-scalar.
That's true, but there are usually opportunities to accelerate operations done inside and in between the user-specified objective function calls with gpuArray variables. The problem is that there are barriers currently to getting full benefit from that. I've tried to expand on this a bit using the code below (an over-simplifed example with a simple linear least squares objective).
The dificulty right now is, if I want to incorporate gpuArray functionality into my objective function (or constraints) in any of the Optimization Toolbox solvers, I have to transfer data to and from the GPU with every call to the objective. It would be better if I could just leave everything on the GPU throughout the optimization. Not only would that spare me the CPU/GPU data transfers, but also the matrix algebra operations done by fmincon with the grad and HessCPU output between calls to objectiveLeastSquares would be done on the GPU and would be faster as well.
A=gpuArray(A);
HessGPU=A.'*A;
y=gpuArray(y);
fmincon(@objectiveLeastSquares(x,A,y), x0, ____);
function [fval,grad,HessCPU] = objectiveLeastSquares(x,A,y,HessGPU)
err=A*x-y; %implicitly sends x to te GPU?
fval=gather( norm(err)^2 );
if nargout>1, grad=gather( A.'*err ); end
if nargout>2, HessCPU = gather(Hess); end
Marek Dostal
el 26 de Nov. de 2018
Joss Knight
el 27 de Nov. de 2018
Matt, you make a fair point and I will capture your request in our system.
Joss
Matt J
el 27 de Nov. de 2018
Thanks, Joss.
Martin Ryba
el 26 de Ag. de 2022
Movida: Matt J
el 26 de Ag. de 2022
Hey Matt, one trick to avoid CPU/GPU transfers for a Jacobian Multiply Function is since these functions are often nested within your main routine, use variables (gpuArrays in your case) that are scoped outside the individual functions. I'm not using a GPU, but I have a very big optimization (~2M measurements and 10k unknowns), and I store some relevant large matrices as external arrays that are updated by the primary function and then used by the JM functions (pos and neg). I don't see why that couldn't work for gpuArrays as well.
@Martin Ryba That is interesting. I can see how that would allow you, when using lsqnonlin, to leave constant data like your measurement array on the GPU. However, there still appear to be problems.
(1) The result of any call to the Jacobian Multiply Function will be an array equal in size to your measurements. There is no way to avoid transfering that result back to the CPU, because that's where lsqnonlin will be looking for it.
(2) It doesn't seem to apply gracefully to lsqcurvefit. When using lsqcurvefit, you want your measurements to reside in the ydata input array, and lsqcurvefit will not allow ydata or xdata to be gpuArrays. Not as big a problem as (1), but still irksome.
Martin Ryba
el 26 de Ag. de 2022
The output array of the JMfun depends on Y and the flag. If you have N measurements and K model parameters, instead of returning the entire NxK Jacobian you're often returning either something 1xN, 2xN or KxK I think. It's been a while since I last ran mine in the debugger to get it working. Either way, it's often a more limited array than what you may be storing in the GPU.
Sure, I agree transferring an Nx1 vector is certainly better than transferring the complete Jacobian. However, as your case illustrates, N can still be quite large. In my world N=60e6,K=40e6 wouldn't be that unusual.
Also, for people who don't supply an analytical Jacobian, relying instead on finite differences, it's not just a matter of a single transfer of an NxK Jacobian matrix. Each iteration will require K transfers of an Nx1 vector (which I suspect would be a lot worse).
Emiliano Rosso
el 10 de Nov. de 2022
Can we modify Matlab code or it's p-coded?
I mean to follow the routine and eliminate errors , type mismatch , ecc...
Thanks!
Bruno Luong
el 10 de Nov. de 2022
Editada: Bruno Luong
el 10 de Nov. de 2022
Just to chim in the discussion about Matt's request of enable optimization functions on GPU.
I think the feature will be mosy interesting for people working on image (2D-3D) processing: denoising, debluring where the number of parameters are t number of pixels. The cost function are sum of some local operator so very suitable for GPU architecture.
If the transfert data between CPU and GPU is NOT required at each iteration or each gradient computation, it will be benefit for the run time.
There might be an new class of optimization method that is still develipped by researchers, but I think one must think about better exploiting GPU archiecture in TMW optimzation toolbox.
Categorías
Más información sobre Quadratic Programming and Cone Programming en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


