How can I remove outliers in my data using Cook's Distance?

3 visualizaciones (últimos 30 días)
Fatemah Ebrahim
Fatemah Ebrahim el 29 de Jun. de 2020
Editada: Fatemah Ebrahim el 29 de Jun. de 2020
I have a large dataset, 6 .'xlsx' files with ~ 400,000 rows each, and I want to use Cook's Distance to determine the outliers in the fourth column of each dataset and then delete the corresponding row. How would I do that?
  2 comentarios
Fatemah Ebrahim
Fatemah Ebrahim el 29 de Jun. de 2020
Editada: Fatemah Ebrahim el 29 de Jun. de 2020
Hi! So I'm using the code they used on one of the '.xlsx' files as so:
X = A_t; % where this is a datetime value
Y = Adata(:,4); % where we are pulling the fourth column of the table
mdl = fitlm(X,Y);
plotDiagnostics(mdl,'cookd')
find((mdl.Diagnostics.CooksDistance)>3*mean(mdl.Diagnostics.CooksDistance))
And I am getting this error:
Error using classreg.regr.TermsRegression/handleDataArgs (line 550)
Predictor variables must be numeric vectors, numeric matrices, or
categorical vectors.
Error in LinearModel.fit (line 1184)
[X,y,haveDataset,otherArgs] =
LinearModel.handleDataArgs(X,varargin{:});
Error in fitlm (line 121)
model = LinearModel.fit(X,varargin{:});
Please let me know if you have any idea how to address this error, there does not seem to be much information on this. Thanks!

Iniciar sesión para comentar.

Respuestas (0)

Categorías

Más información sobre Dimensionality Reduction and Feature Extraction en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by