Outlier removal from a matrix
21 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I removed the outliers from my dataset with rmoutliers(A,'mean') command. It should remove the data 3 standard deviations from the mean of each column. But when I print the histogram of each column, there are still some data as far as 6 standard deviations away. What do you suggest? Here is my code:
A = rmoutliers(table_data,'mean');
Zscores = zscore(A); %(A is a 50000*12 matrix)
figure
histogram(Zscores(:,2))
In the histogram, there are still some data as far as 6 standard deviations away.
1 comentario
John D'Errico
el 11 de Oct. de 2022
help rmoutliers
I had to go to the doc to check your claim that rmoutliers with the 'mean' option does specifically use 3 standard deviations as the cutoff, away from the mean and then it removes the entire row containing that outlier. This is true. But rmoutliers is not a perfect tool, and any such tool can have problems if you dare to push its limits.
x = [ones(1,5),1 + eps,10]
xhat = rmoutliers(x)
xhat == 1
So rmoutliers first removed the 10 as being more than 3 sigma out, but then, since the standard deviation of the first 5 elements is exactly zero, 1+eps is ALSO more than 3 sigma out, and a clear outlier. The point is, if you try hard enough, you can always cause any such adaptive tool to exhibit strange behavior.
But if you want to know what happened, then you need to provide your data. Otherwise, anything is just a wild guess.
Attach it to a comment (not as an answer), in a .mat file.
Respuestas (0)
Ver también
Categorías
Más información sobre Descriptive Statistics en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!