Model Loss Given Default

This example shows how to fit different types of models to loss given default (LGD) data. This example demonstrates the following approaches:

For all of these approaches, this example shows:

  • How to fit a model using training data where the LGD is a function of other variables or predictors.

  • How to predict on testing data.

The Model Comparison section contains a detailed comparison that includes visualizations and several prediction error metrics for of all models in this example.


LGD is one of the main parameters for credit risk analysis. Although there are different approaches to estimate credit loss reserves and credit capital, common methodologies require the estimation of probabilities of default (PD), loss given default (LGD), and exposure at default (EAD). The reserves and capital requirements are computed using formulas or simulations that use these parameters. For example, the loss reserves are usually estimated as the expected loss (EL), given by the following formula:


Practitioners have decades of experience modeling and forecasting PDs. However, the modeling of LGD (and also EAD) started much later. One reason is the relative scarcity of LGD data compared to PD data. Credit default data (for example, missed payments) is easier to collect and more readily available than are the losses ultimately incurred in the event of a default. When an account is moved to the recovery stage, the information can be transferred to a different system, loans can get consolidated, the recovery process may take a long time, and multiple costs are incurred during the process, some which are hard to track in detail. However, banks have stepped up their efforts to collect data that can be used for LGD modeling, in part due to regulations that require the estimation of these risk parameters, and the modeling of LGD (and EAD) is now widespread in industry.

This example uses simulated LGD data, but the workflow has been applied to real data sets to fit LGD models, predict LGD values, and compare models. The focus of this example is not to suggest a particular approach, but to show how these different models can be fit, how the models are used to predict LGD values, and how to compare the models. This example is also a starting point for variations and extensions of these models; for example, you may want to use more advanced classification and regression tools as part of a two-stage model.

The three predictors in this example are loan specific. However, you can use the approaches described in this example with data sets that include multiple predictors and even macroeconomic variables. Also, you can use models that include macroeconomic predictors for stress testing or lifetime LGD modeling to support regulatory requirements such as CCAR, IFRS 9, and CECL.

LGD Data Exploration

The data set in this example is simulated data that captures common features of LGD data. For example, a common feature is the distribution of LGD values, which has high frequencies at 0 (full recovery), and also many observations at 1 (no recovery at all). Another characteristic of LGD data is a significant amount of "noise" or "unexplained" data. You can visualize this "noise" in scatter plots of the response against the predictors, where the dots do not seem to follow a clear trend, and yet some underlying relationships can be detected. Also, it is common to get significant prediction errors for LGD models. Empirical studies show that LGD models have high prediction errors in general. For example, in [4] the authors report R-squared values ranging from 4% to 43% for a range of models across different portfolios. In this example, all approaches get R-squared values just under 10%. Moreover, finding useful predictors in practice may require important insights into the lending environment of a specific portfolio, for example, knowledge of the legal framework and the collection process [2]. The simulated data set includes only three predictors and these are variables frequently found in LGD models, namely, the loan-to-value ratio, the age of the loan, and whether the borrower lives in the property or if the borrower bought it for investment purposes.

Data preparation for LGD modeling is beyond the scope of this example. This example assumes the data has been previously prepared, since the focus of the example is on how to fit LGD models and how to use them for prediction. Data preparation for LGD modeling requires a significant amount of work in practice. Data preparation requires consolidation of account information, pulling data from multiple data sources, accounting for different costs and discount rates, and screening predictors [1] [2].

Load the data set from the LGDData.mat file. The data set is stored in the data table. It contains the three predictors and the LGD variable, which is the response variable.

Here is a preview of the data and the descriptions of the data set and the variables.

      LTV        Age         Type           LGD   
    _______    _______    ___________    _________

    0.89101    0.39716    residential     0.032659
    0.70176     2.0939    residential      0.43564
    0.72078     2.7948    residential    0.0064766
    0.37013      1.237    residential     0.007947
    0.36492     2.5818    residential            0
      0.796     1.5957    residential      0.14572
    0.60203     1.1599    residential     0.025688
    0.92005    0.50253    investment      0.063182
Loss given default (LGD) data. This is a simulated data set.
disp([data.Properties.VariableNames' data.Properties.VariableDescriptions'])
    {'LTV' }    {'Loan-to-Value (LTV) ratio at...'}
    {'Age' }    {'Age of the loan in years at ...'}
    {'Type'}    {'Type of property, either res...'}
    {'LGD' }    {'Loss given default'             }

LGD data commonly has values of 0 (no losses, full recovery) or 1 (no recovery at all). The distribution of values in between 0 and 1 takes different shapes depending on the portfolio type and other characteristics.

title('LGD Distribution')
xlabel('Observed LGD')

Explore the relationships between the predictors and the response. The Spearman correlation between the selected predictor and the LGD is displayed first. The Spearman correlation is one of the rank order statistics commonly used for LGD modeling [5].

SelectedPredictor = 'LTV';

fprintf('Spearman correlation between %s and LGD: %g',SelectedPredictor,corr(double(data.(SelectedPredictor)),data.LGD,'Type','Spearman'))
Spearman correlation between LTV and LGD: 0.271204
if isnumeric(data.(SelectedPredictor))
    X = [ones(height(data),1) data.(SelectedPredictor)];
    b = X\data.LGD;
    y = X*b;
    hold on
    ylim([0 1])
    hold off

For numeric predictors, there is a scatter plot of the LGD against the selected predictor values, with a superimposed linear fit. There is a significant amount of noise in the data, with points scattered all over the plot. This is a common situation for LGD data modeling. The density of the dots is sometimes different in different areas of the plot, suggesting relationships. The slope of the linear fit and the Spearman correlation give more information about the relationship between the selected predictor and the response.

Visually assessing the density of the points in a scatter plot might not be a reliable approach to understand the distribution of the data. To better understand the distribution of LGD values for different levels of a selected predictor, create a box plot.

% Choose the number of discretization levels for numeric predictors
NumLevels =3;
if isnumeric(data.(SelectedPredictor))
    PredictorEdges = linspace(min(data.(SelectedPredictor)),max(data.(SelectedPredictor)),NumLevels+1);
    PredictorDiscretized = discretize(data.(SelectedPredictor),PredictorEdges,'Categorical',string(PredictorEdges(2:end)));
    xlabel([SelectedPredictor ' Discretized'])

For categorical data, the box plot is straightforward since a small number of levels are already given. For numeric data, you can discretize the data first and then generate the box plot. Different box sizes and heights show that the distribution of LGD values changes for different predictor levels. A monotonic trend in the median (red horizontal line in the center of the boxes) shows a potential linear relationship between the predictor and the LGD (though possibly a mild relationship, due to the wide distributions).

Mean LGD Over Different Groups

The basic approach to predict LGD is to simply use the mean of the LGD data. Although this is a straightforward approach, easy to understand and use, the downside is that the mean is a constant value and this approach sheds no light on the sensitivity of LGD to other risk factors. In particular, the predictors in the data set are ignored.

To introduce sensitivity to predictors, the mean LGD values can be estimated over different groups or segments of the data, where the groups are defined using ranges of the predictor values. This approach is still a relatively straightforward approach, yet it can noticeably reduce the prediction error compared to a single mean LGD value for all observations.

To start, separate the data set into training and testing data. The same training and testing data sets are used for all approaches in this example.

NumObs = height(data);
% Reset the random stream state, for reproducibility
% Comment this line out to generate different data partitions each time the example is run
c = cvpartition(NumObs,'HoldOut',0.4);
TrainingInd = training(c);
TestInd = test(c);

In this example, the groups are defined using the three predictors. LTV is discretized into low and high levels. Age is discretized into young and old loans. Type already has two levels, namely, residential and investment. The groups are all the combinations of these values (for example, low LTV, young loan, residential, and so on).

The number of levels and the specific cut off points are for illustration purposes only, you can base other discretizations on different criteria. Also, using all predictors for the discretization may not be ideal when the data set contains many predictors. In some cases, using a single predictor, or a couple of predictors, may be enough to find useful groups with distinct mean LGD values. When the data includes macro information, the grouping may include a macro variable; for example, the mean LGD value should be different over recessions vs. economic expansions.

Compute the mean LGD over the eight data groups using the training data.

% Discretize LTV
LTVEdges = [0 0.5 max(data.LTV)];
data.LTVDiscretized = discretize(data.LTV,LTVEdges,'Categorical',{'low','high'});
% Discretize Age
AgeEdges = [0 2 max(data.Age)];
data.AgeDiscretized = discretize(data.Age,AgeEdges,'Categorical',{'young','old'});
% Find group means on training data
gs = groupsummary(data(TrainingInd,:),{'LTVDiscretized','AgeDiscretized','Type'},'mean','LGD');
    LTVDiscretized    AgeDiscretized       Type        GroupCount    mean_LGD
    ______________    ______________    ___________    __________    ________

         low              young         residential        163        0.12166
         low              young         investment          26       0.087331
         low              old           residential        175       0.021776
         low              old           investment          23        0.16379
         high             young         residential       1134        0.16489
         high             young         investment         257        0.25977
         high             old           residential        265       0.066068
         high             old           investment          50        0.11779

For prediction, the test data is mapped into the 8 groups, and then the corresponding group mean is set as the predicted LGD value.

LGDGroupTest = (data.LTVDiscretized(TestInd)=='high')*4 +...
    (data.AgeDiscretized(TestInd)=='old')*2 +...
    (data.Type(TestInd)=='investment') + 1;
LGDPredictedByGroupMeans = gs.mean_LGD(LGDGroupTest);

Store the observed LGD and the predicted LGD in a new table dataLGDPredicted. This table stores predicted LGD values for all other approaches in the example.

dataLGDPredicted = table;
dataLGDPredicted.Observed = data.LGD(TestInd);
dataLGDPredicted.GroupMeans = LGDPredictedByGroupMeans;
    Observed     GroupMeans
    _________    __________

    0.0064766     0.066068 
     0.007947      0.12166 
     0.063182      0.25977 
            0     0.066068 
      0.10904      0.16489 
            0      0.16489 
      0.89463      0.16489 
            0     0.021776 

The Model Comparison section has a more detailed comparison of all models that includes visualizations and prediction error metrics.

Simple Regression Model

A natural approach is to use a regression model to explicitly model a relationship between the LGD and some predictors. LGD data, however, is bounded in the unit interval, whereas the response variable for linear regression models is, in theory, unbounded.

To apply simple linear regression approaches, the LGD data can be transformed. A common transformation is the logit function, which leads to the following regression model:


LGD values of 0 or 1 cause the logit function to take infinite values, so the LGD data is typically truncated before applying the transformation.

data.LGDTruncated = data.LGD;
data.LGDTruncated(data.LGD==0) = 0.00001;
data.LGDTruncated(data.LGD==1) = 0.99999;
data.LGDLogit = log(data.LGDTruncated./(1-data.LGDTruncated));

Below is the histogram of the transformed LGD data that uses the logit function. The range of values spans positive and negative values, which is consistent with the linear regression requirements. The distribution still shows significant mass probability points at the ends of the distribution.

title('Logit Transformation of Truncated LGD Data')

Other transformations are suggested in the literature [1]. For example, instead of the logit function, the truncated LGD values can be mapped with the inverse standard normal distribution (similar to a probit model).

Fit the model using the training data.

mdlRegression = fitlm(data(TrainingInd,:),'LGDLogit ~ 1 + LTV + Age + Type'); 
Linear regression model:
    LGDLogit ~ 1 + LTV + Age + Type

Estimated Coefficients:
                       Estimate       SE        tStat       pValue  
                       ________    ________    _______    __________

    (Intercept)        -4.7549      0.36041    -13.193    3.0997e-38
    LTV                 2.8565      0.41777     6.8377    1.0531e-11
    Age                -1.5397     0.085716    -17.963    3.3172e-67
    Type_investment     1.4358       0.2475     5.8012     7.587e-09

Number of observations: 2093, Error degrees of freedom: 2089
Root Mean Squared Error: 4.24
R-squared: 0.206,  Adjusted R-Squared: 0.205
F-statistic vs. constant model: 181, p-value = 2.42e-104

The model coefficients match the findings in the exploratory data analysis, with a positive coefficient for LTV, a negative coefficient for Age, and a positive coefficient for investment properties in the Type variable.

The model predictions are on the transformed space, therefore the inverse logit transformation (also known as the logistic, or sigmoid function) must be applied to the model prediction to get a final predicted value for LGD.

LogitLGDPredicted = predict(mdlRegression,data(TestInd,:));
dataLGDPredicted.Regression = 1./(1+exp(-LogitLGDPredicted));
    Observed     GroupMeans    Regression
    _________    __________    __________

    0.0064766     0.066068     0.00091169
     0.007947      0.12166      0.0036758
     0.063182      0.25977        0.18774
            0     0.066068      0.0010877
      0.10904      0.16489       0.011213
            0      0.16489       0.041992
      0.89463      0.16489       0.052947
            0     0.021776     3.7188e-06

The Model Comparison section at the end of this example has a more detailed comparison of all models that includes visualizations and prediction error metrics. In particular, the histogram of the predicted LGD values shows that the regression model predicts many LGD values near zero, even though the high probability near zero was not explicitly modeled.

Beta Regression Model

In a beta regression model for LGD, the model does not directly predict a single LGD value, it predicts an entire distribution of LGDs (given the predictor values). From that distribution, a value must be determined to predict a single LGD value for a loan, typically the mean of that distribution.

Technically, given the predictor values X1,X2,... and model coefficients b and c, you can:

  • Compute values for the parameters μ(mean) and ν(sometimes called the "sample size") of the beta distribution with the following formulas:


  • Compute values for αand β, the typical parameters of the beta distribution, with these formulas:



  • Evaluate the density function of the corresponding beta distribution for a given level of LGD, where Γ is the gamma function; see [1] for details:


For fitting the model, once the density function is evaluated, you can update the likelihood function and find the optimal coefficients with a maximum likelihood approach. See the Local Functions section where the maximum likelihood function hLogLikelihoodBeta is defined.

For prediction, once the model coefficients are fit, a prediction can be made, typically using the mean of the distribution, that is, the μ parameter, as the predicted LGD value.

Fit the beta regression model using training data with maximum likelihood. The maximization of the hLogLikelihoodBeta function is performed with the unconstrained solver fminunc from Optimization Toolbox™.

% Convert Type to numeric binary
TypeDummy = dummyvar(data.Type);
data.Type01 = TypeDummy(:,2);

ColumnNames = {'Intercept' 'LTV' 'Age' 'Type'};
NumCols = length(ColumnNames);

% Initial guess
x0 = 0.1*ones(2*NumCols,1);
% Predictors Matrix and LGD, training
% Use truncated LGD to avoid numerical issues at the boundaries
XTrain = [ones(sum(TrainingInd),1) data.LTV(TrainingInd) data.Age(TrainingInd) data.Type01(TrainingInd)];
LGDTrain = data.LGDTruncated(TrainingInd);
% Minimize negative likelihood
objFunctionBeta = @(x)(-hLogLikelihoodBeta(x,XTrain,LGDTrain));
[Estimate,~,~,~,~,Hessian] = fminunc(objFunctionBeta,x0);
Computing finite-difference Hessian using objective function.

Local minimum found.

Optimization completed because the size of the gradient is less than
the value of the optimality tolerance.
ParameterNamesBeta = [strcat('Mu_',ColumnNames) strcat('Nu_',ColumnNames)];
ModelParametersBeta = array2table(Estimate,'RowNames',ParameterNamesBeta);
ModelParametersBeta.StdError = sqrt(diag(inv(Hessian)));
ModelParametersBeta.DF = height(data)*ones(length(Estimate),1);
ModelParametersBeta.T = Estimate./ModelParametersBeta.StdError;
ModelParametersBeta.PValue = 2*(1-tcdf(abs(ModelParametersBeta.T),ModelParametersBeta.DF));
ModelParametersBeta.ConfidenceInterval = Estimate+ModelParametersBeta.StdError*[-1.96 1.96];
                    Estimate    StdError     DF        T         PValue       ConfidenceInterval 
                    ________    ________    ____    _______    __________    ____________________

    Mu_Intercept     -1.3772     0.13201    3487    -10.433             0      -1.636     -1.1185
    Mu_LTV           0.60269     0.15087    3487     3.9947    6.6126e-05     0.30698      0.8984
    Mu_Age          -0.47464    0.040264    3487    -11.788             0    -0.55356    -0.39572
    Mu_Type          0.45371    0.085143    3487     5.3288    1.0515e-07     0.28683     0.62058
    Nu_Intercept    -0.16338     0.12591    3487    -1.2975       0.19453    -0.41017    0.083413
    Nu_LTV          0.055901     0.14719    3487    0.37979       0.70413    -0.23259      0.3444
    Nu_Age           0.22887    0.040335    3487     5.6743     1.506e-08     0.14982     0.30793
    Nu_Type         -0.14101    0.078154    3487    -1.8043      0.071277    -0.29419    0.012172

For prediction, recall that the beta regression links the predictors to an entire beta distribution. For example, suppose that a loan has an LTV of 0.7 and an Age of 1.1 years, and it is an investment property. The beta regression model gives us a prediction for the αand βparameters for this loan, and the model predicts that for this loan the range of possible LGD values follows the corresponding beta distribution.

XExample = [1 0.7 1.1 1];
MuExample = 1/(1+exp(-XExample*Estimate(1:NumCols)));
NuExample = exp(XExample*Estimate(NumCols+1:end));
AlphaExample = MuExample*NuExample;
BetaExample = (1-MuExample)*NuExample;

xDomain = 0.01:0.01:0.99;
pBeta = betapdf(xDomain,AlphaExample,BetaExample);
title('Predicted Distribution, Single Loan')
xlabel('Possible LGD')
ylabel('Predicted Density')

The shape of the distribution has the U-shaped pattern of the data. However, this is a predicted distribution of LGD values for a single loan. This distribution can be very useful for simulation purposes. However, to predict an LGD value for this loan, a method is required that goes from an entire distribution to a single value.

One way to predict would be to randomly draw a value from the previous distribution. This would tend to give predicted values towards the ends of the unit interval, and the overall shape of the distribution for a data set would match the U-shaped patter of observed LGD values. However, even if the shape of the distribution looked correct, a random draw from the distribution does not work well for prediction purposes. Two points with the same predictor values would have two different predicted LGDs, which is counterintuitive. Moreover, the prediction error at the observation level could be large, since many loans with small observed LGDs could get random predictions of large LGDs, and vice versa.

To reduce the prediction error at the individual level, the expected value of the beta distribution is typically used to predict. The distribution of predicted values with this approach does not have the expected U-shaped pattern because the mean value tends to be away from the boundaries of the unit interval. However, by using the mean of the beta distribution, all observations with the same predictor values get the same predicted LGDs. Moreover, the mean may not be close to values that are on the ends of the distribution, but the average error might be smaller compared to the random draws from the previous approach.

Predict using the mean of the beta distribution. Remember that the expected value of the distribution is the μparameter, so the mean value prediction is straightforward.

XTest = [ones(sum(TestInd),1) data.LTV(TestInd) data.Age(TestInd) data.Type01(TestInd)];
MuTest = 1./(1+exp(-XTest*Estimate(1:NumCols)));
dataLGDPredicted.Beta = MuTest;
    Observed     GroupMeans    Regression      Beta  
    _________    __________    __________    ________

    0.0064766     0.066068     0.00091169    0.093695
     0.007947      0.12166      0.0036758     0.14915
     0.063182      0.25977        0.18774     0.35262
            0     0.066068      0.0010877    0.096434
      0.10904      0.16489       0.011213     0.18858
            0      0.16489       0.041992      0.2595
      0.89463      0.16489       0.052947     0.26767
            0     0.021776     3.7188e-06    0.021315

The Model Comparison section at the end of this example has a more detailed comparison of all models that includes visualizations and prediction error metrics. In particular, the histogram of the predicted LGD values shows that the beta regression approach does not produce a U-shaped distribution. However, this approach does have good performance under the other metrics reported.

Tobit Regression Model

Tobit or censored regression is designed for models where the response is bounded. The idea is that there is an underlying (latent) linear model but that the observed response values, in this case the LGD values, are truncated. For this example, model the 0 lower bound (left censored) using the model formula




The model parameters are all the βs and the standard deviation of the error σ. The parameters are estimated using maximum likelihood. See the Local Functions section, where the maximum likelihood function hLogLikelihoodTobitLeftCensored is defined. The formulas for the left and right censored version of the model are also included at the end of the Local Functions section.

Fit the Tobit regression model using training data and maximum likelihood. The minimization of the hLogLikelihoodTobitLeftCensored function is performed with the constrained solver fmincon from Optimization Toolbox™. The only constraint is on the standard deviation parameter such that σ>0.

x0 = 0.1*ones(NumCols+1,1);
lb = -Inf*ones(NumCols+1,1);
lb(end) = 1e-6; % Sigma must be positive
objFunctionTobit = @(x)(-hLogLikelihoodTobitLeftCensored(x,XTrain,data.LGD(TrainingInd)));
[Estimate,~,~,~,~,~,Hessian] = fmincon(objFunctionTobit,x0,[],[],[],[],lb);
Local minimum possible. Constraints satisfied.

fmincon stopped because the size of the current step is less than
the value of the step size tolerance and constraints are 
satisfied to within the value of the constraint tolerance.
Estimate = Estimate(:);
ModelParametersTobit = array2table(Estimate,...
    "RowNames",[ColumnNames {'Sigma'}],"VariableNames",{'Estimate'});
ModelParametersTobit.StdError = sqrt(diag(inv(Hessian)));
ModelParametersTobit.DF = height(data)*ones(length(Estimate),1);
ModelParametersTobit.T = Estimate./ModelParametersTobit.StdError;
ModelParametersTobit.PValue = 2*(1-tcdf(abs(ModelParametersTobit.T),ModelParametersTobit.DF));
ModelParametersTobit.ConfidenceInterval = Estimate+ModelParametersTobit.StdError*[-1.96 1.96];
                 Estimate     StdError      DF        T         PValue       ConfidenceInterval  
                 _________    _________    ____    _______    __________    _____________________

    Intercept     0.057338     0.020826    3487     2.7532      0.005933    0.016518     0.098157
    LTV            0.20031     0.031386    3487     6.3823    1.9748e-10      0.1388      0.26183
    Age          -0.094065    0.0059527    3487    -15.802             0    -0.10573    -0.082397
    Type           0.10072     0.017828    3487     5.6493    1.7396e-08    0.065775      0.13566
    Sigma          0.28835    0.0050518    3487     57.078             0     0.27845      0.29825

Perform prediction with test data. For prediction, one could think of applying the Tobit formula LGDi=max{0,Yi}=max{0,Xiβ+ϵi}directly, setting the noise term ϵi=0. However, because the noise term is inside the max operator, this would not match the actual expected value of the LGD under this model: E[LGDi]max{0,Xiβ}. When the expectation is taken, a different closed-form formula is obtained. In fact, there are two possibilities for the expected values commonly used to predict Tobit models, one is conditional, and one is unconditional. The hPredictTobitLeftCensored function implements the application of the direct formula max{0,Xiβ}, as well as, the conditional and unconditional expectations (using the optional third input). By default, the function performs prediction with the unconditional expectation. See the Local Functions section for details.

dataLGDPredicted.Tobit = hPredictTobitLeftCensored(Estimate,XTest);
    Observed     GroupMeans    Regression      Beta       Tobit  
    _________    __________    __________    ________    ________

    0.0064766     0.066068     0.00091169    0.093695    0.087026
     0.007947      0.12166      0.0036758     0.14915     0.12275
     0.063182      0.25977        0.18774     0.35262     0.31806
            0     0.066068      0.0010877    0.096434    0.092504
      0.10904      0.16489       0.011213     0.18858     0.16539
            0      0.16489       0.041992      0.2595     0.22152
      0.89463      0.16489       0.052947     0.26767     0.23471
            0     0.021776     3.7188e-06    0.021315    0.010055

The Model Comparison section at the end of this example has a more detailed comparison of all models that includes visualizations and prediction error with different metrics. As with the beta regression, the histogram of the predicted LGD values for the Tobit model does not have a U-shaped distribution, but it ranks well compared to other models.

Two-Stage Model

Two-stage LGD models separate the case with no losses (LGD equal to 0) from the cases with actual losses (LGD greater than 0) and build two models. The stage 1 model is a classification model to predict whether the loan will have positive LGD. The stage 2 model a regression-type model to predict the actual LGD when the LGD is expected to be positive. The prediction is the expected value of the two combined models, which is the product of the probability of having a loss (stage 1 prediction) times the expected LGD value (stage 2 prediction).

In this example, a logistic regression model is used for the stage 1. Stage two is a regression on a logit transformation of the positive LGD data. More sophisticated models can be used for stage 1 and stage 2 models, see for example [4] or [6]. Also, another extension is to explicitly handle the LGD = 1 boundary. The stage 1 model would produce probabilities of observing an LGD of 0, an LGD of 1, and an LGD value strictly between 0 and 1. The stage 2 model would predict LGD values when the LGD is expected to be strictly between 0 and 1.

Fit the stage 1 model using the training data. The response variable is an indicator with a value of 1 if the observed LGD is positive, and 0 if the observed LGD is zero.

IndLGDPositive = data.LGD>0;
data.LGDPositive = IndLGDPositive;
      LTV        Age         Type           LGD       LTVDiscretized    AgeDiscretized    LGDTruncated    LGDLogit    Type01    LGDPositive
    _______    _______    ___________    _________    ______________    ______________    ____________    ________    ______    ___________

    0.89101    0.39716    residential     0.032659         high             young           0.032659       -3.3884      0          true    
    0.70176     2.0939    residential      0.43564         high             old              0.43564      -0.25887      0          true    
    0.72078     2.7948    residential    0.0064766         high             old            0.0064766       -5.0331      0          true    
    0.37013      1.237    residential     0.007947         low              young           0.007947        -4.827      0          true    
    0.36492     2.5818    residential            0         low              old                1e-05       -11.513      0          false   
      0.796     1.5957    residential      0.14572         high             young            0.14572       -1.7686      0          true    
    0.60203     1.1599    residential     0.025688         high             young           0.025688       -3.6357      0          true    
    0.92005    0.50253    investment      0.063182         high             young           0.063182       -2.6965      1          true    
mdl1 = fitglm(data(TrainingInd,:),"LGDPositive ~ 1 + LTV + Age + Type",'Distribution',"binomial");
Generalized linear regression model:
    logit(LGDPositive) ~ 1 + LTV + Age + Type
    Distribution = Binomial

Estimated Coefficients:
                       Estimate       SE        tStat       pValue  
                       ________    ________    _______    __________

    (Intercept)          1.3157     0.21193     6.2083    5.3551e-10
    LTV                  1.3159     0.25328     5.1954    2.0433e-07
    Age                -0.79597    0.053607    -14.848    7.1323e-50
    Type_investment     0.66784     0.17019     3.9241    8.7051e-05

2093 observations, 2089 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 404, p-value = 2.68e-87

An ROC plot for the stage 1 model is commonly reported using test data.

PredictedProbLGDPositive = predict(mdl1,data(TestInd,:));
[x,y,~,AUC] = perfcurve(data.LGDPositive(TestInd),PredictedProbLGDPositive,1);
title(sprintf('ROC Stage 1 Model (AUROC: %g)',AUC))

Fit the stage 2 model using only the training data with a positive LGD. This is the same type of model used earlier in the Regression section, however, this time it is fitted using only observations from the training data with positive LGDs.

dataLGDPositive = data(TrainingInd&IndLGDPositive,{'LTV','Age','Type','LGDTruncated'});
dataLGDPositive.LGDTransform = log(dataLGDPositive.LGDTruncated./(1-dataLGDPositive.LGDTruncated));

mdl2 = fitlm(dataLGDPositive,"LGDTransform ~ 1 + LTV + Age + Type");
Linear regression model:
    LGDTransform ~ 1 + LTV + Age + Type

Estimated Coefficients:
                       Estimate       SE        tStat       pValue  
                       ________    ________    _______    __________

    (Intercept)         -2.9083     0.27538    -10.561    3.2039e-25
    LTV                  1.3883     0.31838     4.3604     1.384e-05
    Age                -0.46116    0.081939    -5.6281    2.1608e-08
    Type_investment     0.78223     0.18096     4.3226    1.6407e-05

Number of observations: 1546, Error degrees of freedom: 1542
Root Mean Squared Error: 2.8
R-squared: 0.0521,  Adjusted R-Squared: 0.0503
F-statistic vs. constant model: 28.3, p-value = 8.73e-18

Perform prediction on the test data using the combined models for stage 1 and stage 2. The predicted LGD is the product of the probability of observing a positive LGD from the stage 1 model times the expected LGD value predicted by the stage 2 model.

% Predict in the transformed space and apply inverse logit to recover the
% LGD prediction for stage 2
PredictedLGDPositive = predict(mdl2,data(TestInd,:));
PredictedLGDPositive = 1./(1+exp(-PredictedLGDPositive));

% PredictedProbLGDPositive is computed before the ROC curve above
% Final LGD prediction is the product of stage 1 and stage 2 predictions
dataLGDPredicted.TwoStage = PredictedProbLGDPositive.*PredictedLGDPositive;

    Observed     GroupMeans    Regression      Beta       Tobit      TwoStage 
    _________    __________    __________    ________    ________    _________

    0.0064766     0.066068     0.00091169    0.093695    0.087026     0.020038
     0.007947      0.12166      0.0036758     0.14915     0.12275     0.034025
     0.063182      0.25977        0.18774     0.35262     0.31806       0.2388
            0     0.066068      0.0010877    0.096434    0.092504     0.022818
      0.10904      0.16489       0.011213     0.18858     0.16539     0.060072
            0      0.16489       0.041992      0.2595     0.22152     0.097685
      0.89463      0.16489       0.052947     0.26767     0.23471      0.11142
            0     0.021776     3.7188e-06    0.021315    0.010055    0.0003689

The Model Comparison section at the end of this example has a more detailed comparison of all models that includes visualizations and prediction error metrics. This approach also ranks well compared to other models and the histogram of the predicted LGD values shows high frequencies near 0.

Model Comparison

To evaluate the performance of LGD models, different metrics are commonly used. One metric is the R-squared of the linear fit regressing the observed LGD values on the predicted values. A second metric is some correlation or rank order statistic; this example uses the Spearman correlation. For prediction error, root mean squared error (RMSE) is a common metric. Also, a simple metric sometimes reported is the difference between the mean LGD value in the training data and the mean LGD value of the predictions. These four metrics are reported below, sorted by decreasing R-squared values.

ModelNames = dataLGDPredicted.Properties.VariableNames(2:end); % Remove 'Observed'
NumModels = length(ModelNames);

SampleMeanError = zeros(NumModels,1);
RSquared = zeros(NumModels,1);
Spearman = zeros(NumModels,1);
RMSE = zeros(NumModels,1);
lmAll = struct;

meanLGDTest = mean(dataLGDPredicted.Observed);

for ii=1:NumModels
    % R-squared, and store linear model fit for visualization section
    Formula = ['Observed ~ 1 + ' ModelNames{ii}];
    lmAll.(ModelNames{ii}) = fitlm(dataLGDPredicted,Formula);
    RSquared(ii) = lmAll.(ModelNames{ii}).Rsquared.Ordinary;
    % Spearman correlation
    Spearman(ii) = corr(dataLGDPredicted.Observed,dataLGDPredicted.(ModelNames{ii}),'type','Spearman');
    % Root mean square error
    RMSE(ii) = sqrt(mean((dataLGDPredicted.Observed-dataLGDPredicted.(ModelNames{ii})).^2));

    % Sample mean error
    SampleMeanError(ii) = mean(dataLGDPredicted.(ModelNames{ii}))-meanLGDTest;

PerformanceMetrics = table(RSquared,Spearman,RMSE,SampleMeanError,'RowNames',ModelNames);
PerformanceMetrics = sortrows(PerformanceMetrics,'RSquared','descend');
                  RSquared    Spearman     RMSE      SampleMeanError
                  ________    ________    _______    _______________

    TwoStage      0.090814    0.41987     0.24197       -0.060619   
    Tobit         0.085498    0.42224     0.23685        0.032766   
    Beta          0.080803    0.41557     0.24112        0.052396   
    Regression    0.070867    0.42152     0.25988        -0.10759   
    GroupMeans    0.041622    0.33807      0.2406       0.0078124   

For the particular training vs. test partition used in this example, the two-stage model has the highest R-squared, although for other partitions, Tobit has the highest R-squared value. Even though the group means approach does not have a high R-squared value, it typically has the smallest sample mean error (mean of predicted LGD values minus mean LGD in the test data). The group means are also competitive for the RMSE metric.

Report the model performance one approach at a time, including visualizations. Display the metrics for the selected model.

ModelSelected = "TwoStage";
                RSquared    Spearman     RMSE      SampleMeanError
                ________    ________    _______    _______________

    TwoStage    0.090814    0.41987     0.24197       -0.060619   

Plot the regression fit (observed LGD vs. predicted LGD), which is a common visual tool to assess the model performance. The R-squared reported above is the R-squared of this regression. The plot shows a significant amount of error for all models. A good predictive model would have the points located mostly along the diagonal, and not be scattered all over the unit square. However, the metrics above do show some differences in predictive performance for different models that can be important in practice.

xlim([0 1])
ylim([0 1])

Compare the histograms of the predicted and observed LGD values. For some models, the distribution of predicted values shows high frequencies near zero, similar to the U shape of the observed LGD distribution. However, matching the shape of the distribution does not mean high accuracy at the level of individual predictions; some models show better prediction error even though their histogram does not have a U shape.

LGDEdges = 0:0.1:1; % Ten bins to better show the distribution shape
y1 = histcounts(dataLGDPredicted.(ModelSelected),LGDEdges);
y2 = histcounts(dataLGDPredicted.Observed,LGDEdges);
bar((LGDEdges(1:end-1)+LGDEdges(2:end))/2,[y1; y2])
title(strcat(ModelSelected,' Model'))
grid on

Show the box plot of the observed LGD values for different ranges of predicted LGD values. A coarser discretization (five bins only) smooths some noise out and better summarizes the underlying relationship. Ideally, the median (red horizontal line in the middle) should have a monotonic trend and be clearly different from one level to the next. Tall boxes also mean that there is a significant amount of error around the predicted values, which in some cases may be due to very few observations in that level. For a good predictive model, the boxes should be short and be located near the diagonal as you move from one level to the next.

LGDEdges = linspace(min(dataLGDPredicted.(ModelSelected)),max(dataLGDPredicted.(ModelSelected)),6); % Five bins
LGDDiscretized = discretize(dataLGDPredicted.(ModelSelected),LGDEdges,'Categorical',string(LGDEdges(2:end)));
ylim([0 1])
title(strcat(ModelSelected,' Model'))
xlabel('Predicted LGD, Discretized')
ylabel('Observed LGD')


This example shows multiple approaches for LGD modeling and prediction. The workflow in this example can be adapted to further analyze the models discussed here or to implement and validate other modeling approaches. This example can be extended to perform a more thorough comparison of LGD models (see for example [3] and [4]).

The example can also be extended to perform a cross-validation analysis to either benchmark alternative models or to fine-tune hyperparameters. For example, better cut off points for the group means could be selected using cross-validation, or alternative transformations of the LGD response values (logit, probit) could be benchmarked to select the one with the best performance. This example can also be a starting point to perform a backtesting analysis using out-of-time data; see for example [5].


[1] Baesens, B., D. Rosch, and H. Scheule. Credit Risk Analytics. Wiley, 2016.

[2] Johnston Ross, E., and L. Shibut. "What Drives Loss Given Default? Evidence from Commercial Real Estate Loans at Failed Banks." Federal Deposit Insurance Corporation, Center for Financial Research, Working Paper 2015-03, March 2015.

[3] Li, P., X. Zhang, and X. Zhao. "Modeling Loss Given Default. "Federal Deposit Insurance Corporation, Center for Financial Research, Working Paper 2018-03, July 2018.

[4] Loterman, G., I. Brown, D. Martens, C. Mues, and B. Baesens. "Benchmarking Regression Algorithms for Loss Given Default Modeling." International Journal of Forecasting. Vol. 28, No.1, pp. 161–170, 2012.

[5] Loterman, G., M. Debruyne, K. Vanden Branden, T. Van Gestel, and C. Mues. "A Proposed Framework for Backtesting Loss Given Default Models." Journal of Risk Model Validation. Vol. 8, No. 1, pp. 69-90, March 2014.

[6] Tanoue, Y., and S. Yamashita. "Loss Given Default Estimation: A Two-Stage Model with Classification Tree-Based Boosting and Support Vector Logistic Regression." Journal of Risk. Vol. 21 No. 4, pp. 19-37, 2019.

Local Functions

Beta Log Likelihood

The log-likelihood function for a beta regression model is






The predictor matrix Xand the vector of observed LGD values come from training data with Nobservations. The density function for the beta distribution is given by (Γ is the Gamma function)


Given the data, the log-likelihood function is a function of the coefficient parameters b and c, even though the formulas above do not make this dependency explicitly to simplify the notation. The log-likelihood function is maximized by varying the b and c parameters. The distribution parameters α>0 and β> 0, and 0<μ<1 and ν>0 are intermediate transformations required to evaluate the log-likelihood function. For more information, see for example [1].

function f = hLogLikelihoodBeta(Params, X, y)

nCols = size(X,2);
b = Params(1:nCols);
c = Params(nCols+1:end);

% Linear predictors
yMu = X*b;
yNu = X*c;

mu = 1 ./ (1 + exp(-yMu));
nu = exp(yNu);

% Transform to standard parameterization
alpha = mu .* nu;
beta = (1-mu) .* nu;

% Negative log-likelihood
likelihood = betapdf(y,alpha,beta);
f = sum(log(likelihood));


Tobit log likelihood

The log-likelihood function is defined as




In other words, for the non-censored cases (LGD > 0), the normal density function ϕ(x;μ,σ)(normpdf) gives the likelihood, whereas for the censored cases (LGD = 0, or more generally LGD <= 0), it is the cumulative distribution function Φ(x;μ,σ)(normcdf) that defines the likelihood. For more information, see for example [1].

function llf = hLogLikelihoodTobitLeftCensored(Params,X,LGD)

Params = Params(:);
beta = Params(1:end-1);
sigma = Params(end);

Y = X*beta;

LeftCensored = LGD==0;
lf = zeros(size(Y));
lf(~LeftCensored) = normpdf(LGD(~LeftCensored),Y(~LeftCensored),sigma);
lf(LeftCensored) = normcdf(0,Y(LeftCensored),sigma);
llf = sum(log(lf));


Tobit prediction

The definition of the Tobit model has two cases: when the LGD is zero, and when it is positive. The unconditional expected value is then


The expectation in the first term is zero. Using the normality assumption on the error term, the second term can be manipulated to show that the unconditional expected value of the LGD is given by


Sometimes, the conditional expected value (conditional on LGD > 0) is used. The formula is almost the same, except for the probability factor.


The helper function hPredictTobitLeftCensored allows you to choose between unconditional or conditional expected values. It also supports the computation of the LGD value using the basic Tobit formula LGDi=max{0,Xiβ}.

function lgdPredicted = hPredictTobitLeftCensored(Params,X,Type)

if nargin < 3
    Type = 'unconditional';

Params = Params(:);
beta = Params(1:end-1);

if strcmpi(Type,'formula')
    lgdPredicted = max(0,X*beta);


    sigma = Params(end);
    YStd = X*beta/sigma;
    phiYStd = normpdf(YStd);
    PhiYStd = normcdf(YStd);
    if strcmpi(Type,'unconditional')
        lgdPredicted = PhiYStd.*(X*beta+sigma*(phiYStd./PhiYStd));
    elseif strcmpi(Type,'conditional')
        lgdPredicted = X*beta+sigma*(phiYStd./PhiYStd);



Tobit model with both the left and right boundaries

The likelihood function has a third case. In general, if the left boundary value is L (0 in this example) and the right boundary value is R (1 in this example), the likelihood function is given by:


For prediction, the derivations of the expected LGD value must consider the possibility that the LGD is on the right boundary. It can be shown that the unconditional expected value can be computed as follows:




The conditional expectation is Xiβ+σλi.