Data normalization using robust scaling

Hello all, I am trying to implement "Robust Scaling" but I am confused. Should I use "all" argument for "median" and "iqr" functions?
Thanks for the help.
DataSet = readtable('Datasets/Test.csv');
DataSet = table2array(DataSet); % Row:7195 x Colums:22
RScaling = (DataSet - median(DataSet))./iqr(DataSet)

 Respuesta aceptada

Voss
Voss el 4 de Jun. de 2024

1 voto

If you want to normalize all columns the same way (i.e., using the median and inter-quartile range of the entire data set), then use "all".
If you want to normalize each column separately (i.e., using each column's own median and inter-quartile range), then do not use "all". And in this case, it's best to use the dim argument set to 1, to explicitly say you want the median and iqr by column, in order to properly handle the situation that your data set has only one row.

4 comentarios

MB
MB el 4 de Jun. de 2024
Editada: MB el 4 de Jun. de 2024
Thank you for your answer. So, I can normalize each column separately or all columns together. I want to explore the effects of various normalization techniques on clustering. I've experimented with the methods defined in the "normalize" function without specifying the "dim" argument. If I understand correctly, this normalizes each column separately. "If A is a matrix, then normalize operates on each column of A separately."
RScaling = (DataSet - median(DataSet, 1))./iqr(DataSet, 1)
Voss
Voss el 4 de Jun. de 2024
Editada: Voss el 4 de Jun. de 2024
You're welcome!
"If I understand correctly, this normalizes each column separately. "If A is a matrix, then normalize operates on each column of A separately.""
That's right. For a matrix that's not a vector, the default dim is 1, so you don't have to specify it (but it doesn't hurt to specify it). However, if you ever had the situation where your data set had one row, then you would need to specify dim as 1 if you want to normalize by column. Therefore, it's a good idea to always include the dim as 1. That's all I was suggesting.
Example: Matrix:
data = [1 2 3; 4 5 6] % non-vector matrix
data = 2x3
1 2 3 4 5 6
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data) % normalize each column
ans = 2x3
-0.7071 -0.7071 -0.7071 0.7071 0.7071 0.7071
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data,1) % same
ans = 2x3
-0.7071 -0.7071 -0.7071 0.7071 0.7071 0.7071
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data,2) % normalize each row
ans = 2x3
-1 0 1 -1 0 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Row vector (matrix with one row):
data = [1 2 3] % row vector
data = 1x3
1 2 3
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data) % without the dim specified, this normalizes all together this time
ans = 1x3
-1 0 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data,1) % normalize each column
ans = 1x3
NaN NaN NaN
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data,2) % normalize each row (same as all together in this case)
ans = 1x3
-1 0 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
MB
MB el 4 de Jun. de 2024
Many thanks.
Voss
Voss el 4 de Jun. de 2024
You're welcome!

Iniciar sesión para comentar.

Más respuestas (0)

Productos

Versión

R2024a

Etiquetas

Preguntada:

MB
el 4 de Jun. de 2024

Comentada:

el 4 de Jun. de 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by