# QR Decomposition on NVIDIA GPU Using cuSOLVER Libraries

This example shows how to create a standalone CUDA® executable that leverages the CUDA Solver library (cuSOLVER). The example uses a curve fitting application that mimics automatic lane tracking on a road to illustrate:

• Fitting an arbitrary-order polynomial to noisy data by using matrix QR factorization.

• Using the coder.LAPACKCallback class to provide the LAPACK library information for the code generator when generating standalone executables.

### Prerequisites

• CUDA enabled NVIDIA® GPU.

• NVIDIA CUDA toolkit and driver.

• LAPACK library that is optimized for your execution environment. For more information, see LAPACK vendors implementations. This example uses the mwlapack libraries that MATLAB® provides in matlabroot/extern/lib.

• Environment variables for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-Party Hardware. For setting up the environment variables, see Setting Up the Prerequisite Products.

### Verify GPU Environment

To verify that the compilers and libraries necessary for running this example are set up correctly, use the coder.checkGpuInstall function.

envCfg = coder.gpuEnvConfig('host');
envCfg.BasicCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);

### Solve a Linear System by Using Matrix Factorization

In curve fitting applications, the objective is to estimate the coefficients of a low-order polynomial. The polynomial is then used as a model for observed noisy data, which in this example represents the lane boundary of the road ahead of a vehicle. For example, when using a quadratic polynomial, there are three coefficients ($a$, $b$, and $c$) to estimate:

$a{x}^{2}+bx+c$

The polynomial that fits best is defined as the one that minimizes the sum of the squared errors between itself and the noisy data. To solve this least-squares problem, you get and solve an overdetermined linear system. An explicit matrix inverse is not required to solve the system.

In this example, the unknowns are the coefficients of each term in the polynomial. Because the polynomial you use as a model always starts from the current position on the road, the constant term in the polynomial is assumed to be zero. Estimate the coefficients for the linear and higher-order terms. Set up a matrix equation Ax=y such that:

• $y$ contains the sensor outputs.

• $x$ contains the polynomial coefficients that we need to obtain.

• $A$ is a constant matrix related to the order of the polynomial and the locations of the sensors.

Solve the equation using the QR factorization of $A$:

$Ax=QRx=y$

and

$x=pinv\left(A\right)*y={R}^{-1}{Q}^{T}*y$

where pinv() represents pseudo-inverse. Given the matrix $A$, you can use the following code to implement a solution of this matrix equation. Factoring $A$ allows for an easier solution of the system.

[Q,R,P] = qr(A);
z = Q' * y;
x = R \ z;
yhat = A * x;

Use the linsolveQR function to solve the equation using the QR factorization of $A$.

type linsolveQR.m
function [yhat,x] = linsolveQR(A,y)
%#codegen

%   Copyright 2019 The MathWorks, Inc.

[Q,R,P] = qr(A);
z = Q' * y;
x = R \ z;
yhat = A * x;

end

### Signal Model for the Road

To test the algorithm, a continuously curving road model is used, that is, a sinusoid that is contaminated with additive noise. By varying the frequency of the sinusoid in the model, you can stress the algorithm by different amounts. This code simulates noisy sensor outputs using our road model:

% Duration - Distance that we look ahead
% N - Total number of sensors providing estimates of road boundary
% Ts - Sample interval
% FracPeriod - Fraction of period of sinusoid to match
% y - Contains the simulated sensor outputs
Duration = 2;
N = 25;
Ts = Duration / N;
FracPeriod = 0.5;
y = sin(2*pi* (0:N-1)' * (FracPeriod/N)) + sqrt(0.3) * randn(N,1);

Use this code to form the Vandermonde matrix $A$:

Npoly = 3;                  % Order of polynomial to use in fitting
v = (0:Ts:((N-1)*Ts))';
A = zeros(length(v), Npoly);
for i = Npoly : -1 : 1
A(:,i) = v.^i;
end

The Vandermonde matrix $A$ and sensor outputs matrix $y$ are passed as input parameters to the linsolveQR entry-point function. These inputs are written to comma-separated files and are read from the handwritten main qrmain.cu.

writematrix(reshape(A, 1, 75), 'inputA.csv');
writematrix(reshape(y, 1, 25), 'inputY.csv');

### Custom Callback Class for Standalone Code Generation

The qr function is only partially supported in the cuSOLVER library. In such cases, GPU Coder™ uses the LAPACK library for certain linear algebra function calls. LAPACK is an external software library for numeric linear algebra. For MEX targets, the code generator uses the LAPACK library included in MATLAB.

For standalone targets, you must define a custom coder.LAPACKCallback class that specifies the LAPACK libraries along with the header files to use for linear algebra calls in the generated code. In this example, the lapackCallback callback class specifies the paths to these libraries in updateBuildInfo method. You must modify this file with the library names and paths for the custom LAPACK installation on your computer.

type lapackCallback.m
classdef lapackCallback < coder.LAPACKCallback
%

%   Copyright 2019 The MathWorks, Inc.

methods (Static)
hn = 'lapacke.h';
end

function updateBuildInfo(buildInfo, buildctx)
[~, libExt] = buildctx.getStdLibInfo();

% Specify path to LAPACK library
if ispc
lapackLocation = [matlabroot,'\extern'];
libName = ['libmwlapack' libExt];
libPath = [lapackLocation,'\lib\win64\microsoft\'];
else
lapackLocation = [matlabroot];
libName = ['libmwlapack' libExt];
libPath = [lapackLocation,'/bin/glnxa64'];
end

end
end
end

### Standalone Code Generation

Generate a standalone executable by the specifying CustomLAPACKCallback property in the code configuration object and using a handwritten main qrmain.cu.

cfg = coder.gpuConfig('exe');
cfg.GpuConfig.EnableCUSOLVER = 1;
cfg.CustomLAPACKCallback = 'lapackCallback';
cfg.CustomSource = 'qrmain.cu';
cfg.CustomInclude = '.';
codegen -config cfg -args {A,y} linsolveQR -report
Code generation successful: View report

### Standalone Code Execution

When you execute the generated standalone executable, the outputs $yhat$ and $x$ are computed and written to comma-separated files. Read these outputs back in MATLAB and use the plot function to visualize the sensor data and fitted curve.

if ispc
system('linsolveQR.exe');
else
system('./linsolveQR');
end
figure
plot(v, y, 'k.', v, yhat, 'r')
axis([0 N*Ts -3 3]);
grid;