How does the "relativeEntropy" function work internally?

I'm trying to use the "relativeEntropy" function but I am getting unexpected results. How does this function work internally and what are the units of its output (bits, nats, etc.)?

 Respuesta aceptada

The "relativeEntropy" function implements the equation for the one-dimensional case that is below equation (5.14) on page 176 (section 5.5.1) in the book: Theodoridis, Sergios, and Konstantinos Koutroumbas. Pattern Recognition, 2nd ed. Amsterdam; Boston: Academic Press, 2003. This equation is reproduced in the screenshot below.
The code snippet at the end of this answer demonstrates how the "relativeEntropy" function works internally by implementing this equation. Moreover, a few things to note about this equation:
  1. As the natural logarithm was used in its derivation, the output has units of nats.
  2. In the screenshot above, d_ij = d_ji. Hence, in this case, the calculation is symmetric.
  3. The entropy calculation assumes that the data in the input "X" follows a Gaussian distribution, as mentioned in the documentation:
Z = relativeEntropy(X,I) calculates the one-dimensional Kullback-Leibler divergence of two independent subsets of data set X that are grouped according to the logical labels in I. The relative entropy provides a metric for ranking features according to their ability to separate two classes of data, such as healthy and faulty machines. The entropy calculation assumes that the data in X follows a Gaussian distribution.
In the code snippet below, we do the following:
  1. Given two pairs of means and variances, we sample 1000 measurements each from two different Gaussian probability density functions (PDFs). Let the two sets of samples from these two PDFs be X1 and X2 respectively.
  2. Compute the KL-divergence between X1 and X2 using the "relativeEntropy" function and store the result in "Z".
  3. Compute the KL-divergence between X1 and X2 by substituting the ground-truth means and variances into the equation above and store the result in "Z_hat1".
  4. Compute the KL-divergence between X1 and X2 by substituting the maximum likelihood estimates of the means and variances into the equation above and store the result in "Z_hat2".
  5. Compare "Z" to "Z_hat2" to show that they are equal up to machine precision.
% number of samples n = 1000; % mean variances of X1 and X2 var1 = 4; var2 = 25; mean1 = 3; mean2 = 7; % sample X1 and X2 from Gaussian distributions X1 = sqrt(var1) * randn(n,1) + mean1; X2 = sqrt(var2) * randn(n,1) + mean2; % Concatenate into array. Note that the relativeEntropy function computes % the 1D KL-divergence between samples of X1 and X2 for each column of X X = [X1;X2]; I = logical([ones(1,n),zeros(1,n)]); % Compute the KL-divergence using the function Z = relativeEntropy(X,I) % Compute the KL-divergence from the equation for the one-dimensional case % that is below equation (5.14) on page 176 (section 5.5.1) in the book % Theodoridis, Sergios, and Konstantinos Koutroumbas. Pattern Recognition, 2nd ed. Amsterdam; Boston: Academic Press, 2003. Z_hat1 = 0.5 * ((var2 / var1) + (var1 / var2) - 2) + ... 0.5 * (mean1 - mean2)^2 * ((1/var1) + (1/var2)) % Repeat the same computation from the book, but this time use estimates of % the mean and variance not the ground truth mean and variance var1_hat = var(X1); var2_hat = var(X2); mean1_hat = mean(X1); mean2_hat = mean(X2); Z_hat2 = 0.5 * ((var2_hat / var1_hat) + (var1_hat / var2_hat) - 2) + ... 0.5 * (mean1_hat - mean2_hat)^2 * ((1/var1_hat) + (1/var2_hat)) % Z and Z_hat2 are exactly the same assert(abs(Z - Z_hat2) < eps)

Más respuestas (0)

Productos

Versión

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by