fitcsvm with identical variables gives different result on different machines

Question

fireattack el 18 de Sept. de 2016

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/303538-fitcsvm-with-identical-variables-gives-different-result-on-different-machines

Comentada: kc el 19 de Jun. de 2019

I encountered this weird problem and it totally messed up my experiment data (i.e. I can't reproduce same thing on two computers), so I did some testing and find that it's caused by fitcsvm function.

I made a simple steps of reproduction so you can give it a try, if you're curious. It requires Statistics and Machine Learning Toolbox.

Download the data from this MAT file: https://drive.google.com/file/d/0B-nVQqvDdrrIaVZUSElKUlVzTU0/view?usp=sharing
The code below, which is a simplified example derived from my research:

clear all
rng(90);
load('bugtestdata.mat')
SVMModel = fitcsvm(inputs,outputs,'KernelFunction','rbf',... 
    'OutlierFraction',0.2,...
    'BoxConstraint',10,'ClassNames',[0,1]);
disp(rand(1))
disp(SVMModel.NumIterations)
disp(SVMModel.Bias)

Note: from what I can tell, fitcsvm function (at least with my inputs) doesn't contain anything that is random seed based. But just in case, I added rng(90) before. It really doesn't have any effect on this bug, though (tested).

So, with this simple code, I can get 2 different results on 5 computers in total (all of them are 64-bit MATLAB)

Result no.1:

0.1531
258
0.2385

Can be reproduced on:

My laptop: OS: Microsoft Windows 7 Ultimate; Matlab: R2016a
My uni's supercomputer: OS: Linux 2.6.32-642.3.1.el6.x86_64 #1 SMP Tue Jul 12 18:30:56 UTC 2016 x86_64; Matlab: R2016a
My uni's lab computer: OS: Win 10; R2016a

Result no.2:

0.1531
349
0.1921

Can be reproduced on:

A virtual desktop provided by my university: OS: Microsoft Windows 8.1 Enterprise; Matlab: R2016a
My desktop computer, OS: Win 7; Matlab: R2016a / R2016b

As you can see, they seem totally random: two of my personal computers have same OS, but it gives different answers.

All the MATLAB have academic license.

If anyone can help, it would be very appreciated.

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

fireattack el 19 de Sept. de 2016

Editada: fireattack el 19 de Sept. de 2016

From what I can tell, they all have Intel CPU.

My laptop: i5-2410M (no discrete GPU)
My desktop: i5-4570 + GeForce GTX 660
My school's virtual desktop: Xeon E5-2690 v3 (no discrete GPU)
My lab computer: i7 something
Supercomputer: Intel Xeon 2.5GHz E5-2670 v2

Walter Roberson el 19 de Sept. de 2016

I have tried a couple of different configurations here, native or virtual machines; so far I have only seen Result #1. I am loading up a Windows 8 virtual machine now to test on.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Ilya el 20 de Sept. de 2016

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/303538-fitcsvm-with-identical-variables-gives-different-result-on-different-machines#answer_235403

My guess is that gradients for two or more observations become equal within floating-point accuracy during optimization. The solver then picks one observation for update in one setup and another observation in another setup. From that point on, the optimization paths are different.

Such problems arise when you have discrete predictors. Your predictors 3 to 6 have 4 distinct values each. If you add a small amount of white noise to your predictors, I suspect the results returned in all configurations are going to be identical (or almost identical).

Standardizing the data would likely improve learning as well since the standard deviations for predictors 3 and 6 differ by two orders of magnitude.

2 comentarios
Mostrar NingunoOcultar Ninguno

fireattack el 21 de Sept. de 2016

Editada: fireattack el 21 de Sept. de 2016

Thank you very much!

Standardizing data helps a lot. After using that, I can get the exactly same answer for both this example and my actual experiment data. Not to mention it's much faster to train :D

I think I chose to not use "standardize" before is because it seems to give a slightly lower accuracy (in my prediction testing) but now in hindsight I think it's totally worth.

I tried white noise solution as well (but I am not sure how to implement it properly so I end up just adding a random number within the range of -+2% for each data point), but it (alone, without standardizing) still sometimes returns different results on different machines. Gotta investigate it further later.

I will see, if no better one I am going to mark this as answer later. Thanks again!

kc el 19 de Jun. de 2019

Can SVM using on matlab 2015 and 2018 can give diffrent results.Plese guide.

Iniciar sesión para comentar.

fitcsvm with identical variables gives different result on different machines

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuesta aceptada

2 comentarios
Mostrar NingunoOcultar Ninguno

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

fitcsvm with identical variables gives different result on different machines

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuesta aceptada

2 comentarios Mostrar NingunoOcultar Ninguno

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

2 comentarios
Mostrar NingunoOcultar Ninguno