Finding outliers in a dataset

5 visualizaciones (últimos 30 días)
Salma fathi
Salma fathi el 2 de Ag. de 2022
Respondida: Cris LaPierre el 2 de Ag. de 2022
Hello, shown in the image are the plots for the dataset I am having. I am trying to clean out the dataset from outliers so that later on I would use it to train a machine learning model.
but apparently it is considering a lot of important data points as outliers, so is there any other approach I could follow to get rid of the outliers?
the plot on top is the whole dataset and in the bottom is after removing the outliears using the following lines
nonOutliers=rmoutliers(Matrix3, 'mean');
figure(3);tiledlayout(2,1);nexttile;
scatter(Matrix3(:,1),Matrix3(:,2),1);
nexttile;
scatter(nonOutliers(:,1),nonOutliers(:,2),1)
ylim([0 10*10^12])
  1 comentario
Monica Roberts
Monica Roberts el 2 de Ag. de 2022
One thing to consider is, what do you consider outliers when you look at the graph? Right now, MATLAB doesn't seem to be considering the X-values when calculating outliers. You may want to consider splitting your data into chunks and passing it into rmoutliers. I'd start at where the data shoots up and group every ~200 values of x, pass those chunks into rmoutliers, and see what happens.
There are also other parameters you can pass into rmoutliers. For instance, maybe "mean" isn't the best method of detecting outliers for this dataset. Have you tried the others? The 'movmean' or 'movmedian' methods, for instance, might do the chunking I've described.

Iniciar sesión para comentar.

Respuestas (1)

Cris LaPierre
Cris LaPierre el 2 de Ag. de 2022
If you process your data in a live script, consider interactively exploring different ways to detect and remove outliers using the Clean Outlier Data live task. See here:

Categorías

Más información sobre Data Import from MATLAB en Help Center y File Exchange.

Productos


Versión

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by