How does fsrftest calculate the p-value?

11 visualizaciones (últimos 30 días)
Isaiah
Isaiah el 18 de Dic. de 2023
Editada: Ive J el 8 de En. de 2024
I am trying to understand how the fsrftest works in MATLAB. From the documentation, I understand that it uses an F-Test to test a null hypothesis and alternative hypothesis. Subsequently the p-value is used to determine the importance of the feature. From my understanding the p-value is also not compared with a significance level and as such this function does not actually reject/accept either hypothesis but rather just uses the p-value to rank features.
My question is regarding how is the p-value calculated? Is the process the same as ANOVA?

Respuesta aceptada

Ive J
Ive J el 8 de En. de 2024
Editada: Ive J el 8 de En. de 2024
At the end of doc you can see it uses -log(p) to rank features, so there is no significance level here. And yes, it's same as ANOVA (to be precise, it's a GLM), note that NumBins argument is used to bin continuous features.
n = 100; % sample size
data = table;
data.BMI = randi([18, 50], n, 1);
% bin BMI into two categories
med_bmi = median(data.BMI);
idx = data.BMI > med_bmi;
data.BMI(idx) = 1;
data.BMI(~idx) = 0;
data.Sex = randi([0, 1], n, 1);
data.Target = randn(n, 1);
mdl_bmi = fitlm(data(:, ["BMI", "Target"]))
mdl_bmi =
Linear regression model: Target ~ 1 + BMI Estimated Coefficients: Estimate SE tStat pValue _________ _______ ________ _______ (Intercept) 0.04267 0.13963 0.30559 0.76056 BMI -0.067441 0.19746 -0.34153 0.73343 Number of observations: 100, Error degrees of freedom: 98 Root Mean Squared Error: 0.987 R-squared: 0.00119, Adjusted R-Squared: -0.009 F-statistic vs. constant model: 0.117, p-value = 0.733
mdl_sex = fitlm(data(:, ["Sex", "Target"]))
mdl_sex =
Linear regression model: Target ~ 1 + Sex Estimated Coefficients: Estimate SE tStat pValue ________ _______ ________ _______ (Intercept) -0.10768 0.14984 -0.71864 0.47407 Sex 0.20462 0.19847 1.031 0.30509 Number of observations: 100, Error degrees of freedom: 98 Root Mean Squared Error: 0.983 R-squared: 0.0107, Adjusted R-Squared: 0.000635 F-statistic vs. constant model: 1.06, p-value = 0.305
[~, sc] = fsrftest(data, "Target", "NumBins", 2);
p = exp(-sc)
p = 1×2
0.7334 0.3051

Más respuestas (0)

Productos


Versión

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by