Plotting negative values in boxplot

I am trying to make a box plot of data in 6 different categories. Some of my data points are negative, and I am running into the problem that when I call the boxplot function it cuts off the y axis at 0 and I cannot get a good visual of the negative values. I am using MATLAB R2019a. Any insight on this would be appreciated;
flow_rates_w = NUM_w(1:183,24:29)
boxplot(flow_rates_w)

10 comentarios

I don't see any such effect here; show us the exact code use and attach a sample dataset that causes the problem there.
You can always change ylim manually --
ylm=ylim;
ylim([-ylm(1) ylm(2)])
will set the lower to the negative of the current upper; can be arbitrary to whatever range needed...
Adam Danz
Adam Danz el 3 de Ag. de 2022
Editada: Adam Danz el 3 de Ag. de 2022
boxplot handles negative values.
data = randi(11,100,5)-6
data = 100×5
-2 0 5 -5 3 5 0 -1 -1 4 0 5 1 4 2 -5 0 1 1 0 -4 -2 -5 2 1 -3 -4 -1 3 -3 -2 -3 1 -4 -3 0 -2 2 -2 1 -3 2 -1 -2 0 3 -4 5 4 0
boxplot(data)
yline(0)
Perhaps your negative values are not large enough to appear. For example, these values are all 0:10 except for 3 that are -0.005.
data2 = randg(.25,100,5);
data2([80,220,350]) = -0.005;
min(data2,[],'all')
ans = -0.0050
figure
boxplot(data2)
Investigate the frequency and magnitude of negative values in your data to get a sense of what the plot should look like.
dpb
dpb el 3 de Ag. de 2022
Good point,@Adam Danz -- @Marguerite Lorenzo, NB the axis limit isn't identically zero but is something <0 but not as large as -0.5E5 or the tick mark would've been drawn -- but, it's fairly close to that it appears.
jessupj
jessupj el 3 de Ag. de 2022
Editada: jessupj el 3 de Ag. de 2022
the values span multiple orders of magnitude; this looks like a good time to use logarithms
dpb
dpb el 3 de Ag. de 2022
Editada: dpb el 3 de Ag. de 2022
Excepting can't show negative numbers on log axis -- I already checked that nothing has been added to the boxplot function to deal with such a case; it acts the same as any other axes in that regards -- there's a FEX submission I believe that reflects an axis around 0 with a transform on the values to avoid the discontinuity at 0. It's not mathematically correct as the decade around the labelled "0" location covers everything from the actual decade plotted down on the same range as a single decade, but it can be useful visualization tool for the case of very widely dispersed data that is both positive and negative. But, boxplot can't make use of that trick...the negative data will simply not be shown if try to set the axis YScale to 'log'
Thank you all for your answers and comments. Below is my code and attached is the data sheet I am working with, and here are some further attempts based on your ideas.
[NUM_w,TXT_w,RAW_w] = xlsread("data_boxplot")
flow_rates_w = NUM_w(1:183,24:29)
boxplot(flow_rates_w)
Attempt 1 - I tried setting the y-axis limits manually, but the plot has the same cut-off at 0:
ylim([-1E5 4E5]) % attempt 1
Attempt 2 - I tried using log scale. The plot seems to be more visually telling, but indeed the log scale prevents from visualizing the negative range of my data:
set(gca, 'YScale', 'log') % attempt 2
It is true that looking at the data, the negative values are quite small in terms of order of magnitude, so maybe boxplot is not the best plotting tool for this. It would be great to be able to have the layout of the boxes from the log scale but find a way to include the negative range... Please let me know what you think would be a better way to visualize this data as a box plot, or other statistical plot/graph, I am open to ideas. Thank you in advance!
I'm not quite sure what you're expecting to see. Your data has a large range and very small negative values so it's expected that the range, indicated by whiskers, will end at or very close to 0 -- so close that you can't see it. Viewers who know how to read box plots should see the range of the y-axis would understand that there are limitations to how precise the visualization can be. The whisker length is sub-pixel in height and cannot be shown.
[NUM_w,TXT_w,RAW_w] = xlsread('data_boxplot.xlsx');
flow_rates_w = NUM_w(1:183,:);
range(flow_rates_w) %
ans = 1×6
1.0e+05 * 1.1400 3.4600 2.8500 0.0070 1.4000 0.5502
min(flow_rates_w)
ans = 1×6
0 -0.1449 -0.1400 -1.9966 0.2000 -0.8423
Are the negative values important? Are you worried about viewers thinking that the min values are 0?
What about histograms?
figure
tiledlayout(6,1,'TileSpacing','compact','Padding','compact')
positiveBins = linspace(0, max(flow_rates_w,[],'all'),20);
bins = [positiveBins(1)-positiveBins(2), positiveBins];
for i = 1:6
nexttile
histogram(flow_rates_w(:,i),bins)
end
dpb
dpb el 4 de Ag. de 2022
Editada: dpb el 4 de Ag. de 2022
Suggestions for TMW/@Adam Danz to consider...
I would suggest to set ylim at -0.5E5 and then there will be a tick mark and label that will make it clear the axes really isn't terminated at 0. I think in the similar cases, the default range should be even tick values and let the user tighten the range if wish instead??? Or just label the bottom axis value (although there's no tick there that would be added text)???
"The whisker length is sub-pixel in height and cannot be shown."
I wonder if it would be better to "cheat" in the other direction and show that pixel even if it is somewhat exaggerated???
Marguerite Lorenzo
Marguerite Lorenzo el 4 de Ag. de 2022
Hi, thank you for your response. I think the histrogram wouldn't be a bad way to look at the data, but am hoping to somehow visualize some of the statistics that the boxplot function offers (ie. mean, 25th and 75th percentile). The negative values, although small, have their importance - and from the boxplot in log scale (screenshot in my previous comment) it appears that for categories 2 and 6 the 25th percentiles fall in the negative range which and it would be nice to be able to see those values....
" it appears that for categories 2 and 6 the 25th percentiles fall in the negative range which and it would be nice to be able to see those values.."
Zoom the y axis --
ymn=min(flow_rates_w,[],'all');
yup=1E5;
ylim([ymn yup])
adjust as desired.
You could use tiledlayout and present the full-scale plot on one and the detail on a second -- having a builtin inset function would be handy for such things...

Iniciar sesión para comentar.

Respuestas (1)

You could add a second axes that zooms into the small, negative values.
[NUM_w,TXT_w,RAW_w] = xlsread("data_boxplot");
flow_rates_w = NUM_w(1:183,:);
fig = figure();
tcl = tiledlayout(fig,4,1);
ax1 = nexttile(tcl,[3,1]);
boxplot(ax1,flow_rates_w);
Warning: boxplot might not be displayed properly in the tiled chart layout.
ax2 = nexttile(tcl);
boxplot(ax2,flow_rates_w); % or copyobj
Warning: boxplot might not be displayed properly in the tiled chart layout.
% zoom in to negative values
gmin = min(flow_rates_w,[],'all') * 1.1;
ylim(ax2,abs(gmin).*[-1,1])
yline(0,':','Color',[.6 .6 .6])
linkaxes([ax1,ax2],'x')

2 comentarios

Marguerite Lorenzo
Marguerite Lorenzo el 4 de Ag. de 2022
I like the idea of having a second zoomed in plot, thank you. Would there be a way to do something like this in log scale?
dpb
dpb el 4 de Ag. de 2022
Only if you use the aforementioned "trick" of plotting abs(x) and then manually relabelling -- negative values simply don't have results in the real plane; you can't avoid that problem.
There is the FEX submission <sym_log> that shows how for ordinary data; doing the boxplot would take writing something similar for it to draw the various pieces for the negative data...one could possibly manage to extract the necessary pieces from the original although much of the content is hidden I think. I've not done any poking at the internals to see.

Iniciar sesión para comentar.

Productos

Etiquetas

Preguntada:

el 3 de Ag. de 2022

Comentada:

dpb
el 4 de Ag. de 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by