Violinplot extending beyond data range

39 visualizaciones (últimos 30 días)
Angie
Angie el 28 de Nov. de 2024 a las 13:56
Comentada: William Rose hace alrededor de 18 horas
Hello everyone,
I’m using the violinplot function in MATLAB to create violin plots for some datasets. I am specifying the position and the data as follows:
violinplot(3, data2(5:end));
However, I’ve encountered an issue. The violin plot extends to negative values even though all my data values are positive. For another dataset, I observed a similar problem: the violin plot includes values that are negative or larger than the maximum values in my data.
I’ve read that this might be caused by the kernel density estimation (KDE) method used by violinplot to calculate and visualize the data's probability density. KDE smooths the data distribution and can sometimes produce density values outside the actual range of the data.
I’m unsure how to resolve this issue and would greatly appreciate any advice or suggestions.
Thank you!
Angie

Respuesta aceptada

William Rose
William Rose el 28 de Nov. de 2024 a las 15:56
Editada: William Rose el 28 de Nov. de 2024 a las 16:01
[Edit: add ylim() so that all 3 plots have same y-axis range.]
You can vary the bandwidth, or the kernel function, or both. In the examples below, the data are uniformly distributed on (0,1), which is kind of a worst case, if you don't want the violin to extend to negative values. The violins do extend beyond the data in the examples below, but the options control by how much it extends. Experiment to see if you like the results. You may not be able to avoid the violin going negative, depending on your data.
ydata = rand(100,1);
figure;
%
subplot(131)
violinplot(ydata);
title('Default Violinplot'); ylim([-.5,1.5])
%
[f1,xf1] = kde(ydata,Bandwidth=0.05);
subplot(132)
violinplot(EvaluationPoints=xf1,DensityValues=f1)
title('Bandwidth=0.05'); ylim([-.5,1.5])
%
[f2,xf2] = kde(ydata,Kernel="box");
subplot(133)
violinplot(EvaluationPoints=xf2,DensityValues=f2)
title('Box Kernel'); ylim([-.5,1.5])
  4 comentarios
Angie
Angie hace alrededor de 19 horas
Thank you very much! As a pdf obtained with a kernel distribution extends beyond the most extreme data points in my dataset, which is something I want to avoid, I was considering using other distributions instead. Your examples have been very helpful.
William Rose
William Rose hace alrededor de 16 horas
@Angie, you're welcome.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Automotive en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by