Editor's Note: This file was selected as MATLAB Central Pick of the Week
NOTE: this function is now available from the IoSR Matlab Toolbox as iosr.statistics.boxPlot.

Alternative box plot function for Matlab with many options. These options include:
 Variable sample sizes (via the tab2box() function).
 Show box sample size.
 Scaled or uniform box spacing.
 Box width scaled by sample size.
 Overlay scatter plots of underlying data.
 Overlay the mean of the data.
 Overlay additional percentiles, and attach labels to them.
 Hierarchical Xlabeling and support for multidimensional data.
 Notched boxes.
 Vertical lines to separate groups.
 Automated construction of a legend.
 Set box limits as percentiles.
 Set whisker extent via various methods.
 Use of weighted quantiles.
 Creation of violin plots.
Christopher Hummersone (2020). Alternative box plot (https://github.com/IoSRSurrey/MatlabToolbox), GitHub. Retrieved .
3.2.1.0  Function now natively supports subgroups, handles NaNs more robustly, and returns sample size(s). A few other minor tweaks and doc changes. 

3.2.1.0  Added acknowledgement. 

3.2.1.0  Updated description. 

3.2.1.0  Moved function in to updated toolbox. 

3.2.1.0  Updated version number. 

3.1.1.0  Moved file to Github. 

3.1.1.0  Added percentile options for box and whisker extent. Modified help to clarify scatter plotting. 

3.1.0.0  Added 'theme' property to allow multiple display properties to be changed simultaneously. Various bug fixes. 

3.0.2.0  More bug fixes. Forced legend alpha to match box alpha. 

3.0.1.0  Added ability to specify colors as a colormap function. Changed some size properties to no longer be specifiable for each group. Bug fixes. 

3.0.0.0  Reimplemented function as a class (with a new name). Added property to automate construction of a legend. 

2.0.8.0  Added meanSize property. 

2.0.7.0  Corrected calculation of mean when data include NaN. Updated picture. 

2.0.6.0  Added options for displaying the means. 

2.0.5.0  Added trap for MarkerEdgeAlpha (scatterAlpha), which older versions of Matlab do not support. 

2.0.4.0  Added layer and transparency options. 

2.0.3.0  Fixed automatic placement of boxes. 

2.0.2.0  Corrected bug where x offset was erroneous if y was not hierarchical. 

2.0.1.0  Small tweaks to improve robustness. 

2.0.0.0  Numerous changes to interface in order to support multidimensional data and improve hierarchical data labelling. Added scaleWidth option to scale box widths according to sample size. 

1.6.3.0  Added xshaping to outliers and tweaked offset distribution. 

1.6.2.0  Added x shaping for overlayed scatter plots whereby the random x offset is related to the data distribution. Fixed bug whereby xSeparator handle would be sought even if xSeparator was not specified. 

1.6.1.0  Shuffled a few things around so that the creation order defines the layering, rather than relying on uistack(). Added linkprop() and listener to keep axes in sync and to change xseparator if ylim changes. 

1.6.0.0  Forgot to include tab2box. 

1.6.0.0  Updated image. 

1.6.0.0  Added options to: overlay a scatter plot of the underlying data, display the sample size in each box, add xgroup separators, and add hierarchical labelling. Shuffled outputs slightly. Thanks to Arnold for the suggestions. 

1.5.1.0  Fixed bug where x is a cell array of strings. 

1.5.0.0  Remove NaN from x data (and corresponding y data). 

1.4.1.0  Updated documentation. 

1.4.0.0  Improved robustness to small or empty samples. Add tab2box function for arranging tabular into the format accepted by box_plot. 

1.3.0.0  Function now natively supports subgroups, handles NaNs more robustly, and returns sample size(s). A few other minor tweaks and doc changes. 

1.2.0.0  Moved quantile calculation to new function. 

1.1.0.0  Changed/corrected quantile estimation algorithm. Details in help text. 
Inspired by: notBoxPlot, Hierarchically grouped boxplot
Create scripts with code, output, and formatted text in a single executable document.
Jordan Lui (view profile)
What version of Matlab is required for this? I'm using an older version of Matlab and cannot run installer because websave() method is not available.
mimi ada (view profile)
Ikke89 (view profile)
I am trying to group my boxes with the GROUPLABELS option. I struggle to understand from the documentation how to do this properly. I have tried all kind of combinations of cell arrays, but I keep getting the error:
Error using iosr.statistics.boxPlot/drawStyle (line 1850)
The GROUPLABELS option should be a cell vector; the Nth element should contain a vector of length SIZE(Y,N+2)
Could somebody help me with a simple example? E.g. how I would group a 10x4 array (4 boxes) into two groups, so that the the adjacent boxes are in the same group?
Thank you very much in advance!
Dominic Yan (view profile)
Hi Christopher,
I have a trivial question here: How can I plot different colors for multiple boxes in one plot?
Jochen2 (view profile)
I had to change the following in boxPlot.m for 2019a:
obj.handles.samplesTxt(subidx{:}) = text(double(obj.xticks(subidx{2})+gOffset+halfboxwidthxoffset),...
double(obj.statistics.PU(subidx{:})yoffset),...
num2str(sum(~isnan(obj.y(subidxAll{:})))),...
'horizontalalignment','right','verticalalignment','top');
(the text command requires doubles for x and y)
Also I changed the following in kernelDensity.m :
if numel(x) < 1
d = 0;
bw = 0;
xd = 0;
return
end
(this allows you to showScatter even if one box only has one data point)
Jochen2 (view profile)
Regarding my former comment: It should be
if numel(x) < 1
d = 0;
bw = 0;
xd = 0;
return
end
(this allows you to showScatter even if one box only has one data point)
Aiyush (view profile)
How do I change the font size of the resulting legend? I tried retrieving the legend object from the handle using
h=findobject(gcf,'Type','Legend');
set(h,'FontSize',10)
but this won't work when there are multiple legends on the figure.
Ist (view profile)
Nan (view profile)
Michael Marquis (view profile)
Adriana Murraças (view profile)
Christopher Hummersone (view profile)
Hi Arnold. I no longer have access to MATLAB, so I’m afraid I won’t be doing this anytime soon!
arnold (view profile)
Hi Christopher,
you might want to consider adopting tables, especially categorical input for tab2box instead of just cell arrays out of convenience.
regards
Arnold
Christopher Hummersone (view profile)
Yes, you can see here how the different methods lead to different outcomes: https://uk.mathworks.com/matlabcentral/mlcdownloads/downloads/submissions/46555/versions/10/screenshot.jpg
Julian Rüdiger (view profile)
Thanks Christopher for the fast response. Is the difference expected due to the R8, R5 calculation methods? I did compare the results by using R5 with your function as well. However, I found a workaround for my application by fixing the unweighed index in quantile.m at line 179: q(m,n) = Qp(xSorted,huw). This worked to give the same median in the plots as using the matlab function.
Christopher Hummersone (view profile)
Hi Julian. I don’t currently have access to MATLAB, so I’m afraid my ability to help is somewhat limited. However, note that it is expected that the median produced by these functions is different to the median MATLAB calculates. This is described in the iosr.statistics.quantile help.
Julian Rüdiger (view profile)
Hi Christopher,
thanks for this awesome function. I got an issue with the median of an array using you function, which doesn't result in the same number as using the matlab median function or the matlab boxplot itself. I was trying to find the source for this inconsitency:
"iosr.statistics.boxPlot" uses "iosr.statistics.statsPlot" to calculate the median, which uses "iosr.statistics.quantile(obj.y,.5,[],obj.method,obj.weights)" so far so good. Using "iosr.statistics.quantile" without weights defined gives the same result as the matlab function. But if I use "iosr.statistics.quantile" with the weights that are definded in "iosr.statistics.statsPlot" namely "obj.weights = ones(size(obj.y))" a different results is calculated.
Have you experienced this issue yet?
Thank you!
Cheers,
Julian
Christopher Hummersone (view profile)
Hi Arne. Not directly, no, but you can use the iosr.statistics.tab2box function to put the data in to boxPlot’s required format.
Arne Graul (view profile)
Hello Christopher,
is there a way to plot multiple boxes in one plot, using a grouping variable just like in the Matlabboxplot function? (boxplot(x,g))
Thanks a lot
Christopher Hummersone (view profile)
See the solution here: https://github.com/IoSRSurrey/MatlabToolbox/commit/11f8077e6870a961e3106e371f149f838af397f2#commitcomment23722979
Laurens (view profile)
Very good toolbox! Is there a possibility to add rotation to group labels?
Till (view profile)
Very helpful! Your work is appreciated.
Christopher Hummersone (view profile)
Thanks for that, Filntisis. I've uploaded a fix to GitHub.
Filntisis Panagiotis (view profile)
Hi Christopher:
test = [2,3,2,2,2,3,2,1,2,2,3,2,2,3,2,3,2,2,2,4,3,4,2,5,2,4,1,2,2,1,2,4,2,3,3,4,2,2,2,2,1,3,1,2,2,2,3,4,1,1]';
iosr.statistics.boxPlot(test);
Here the upper limit is 4.5, however when I check the object outliers they are empty (and they are not plotted).
I tied to check a bit (not that familiar with matlab programming) and at line 1896 of boxPlot.m the statsPlot.calculateStats is called and the outliers are populated, however, at line 19041905 outliers are overwritten to 0 again.
Christopher Hummersone (view profile)
Hi Filntisis. If the outliers array is empty then are you sure that your data has outliers?! The outliers are calculated in the statsPlot base class. Can you post a minimal example that produces the error?
Filntisis Panagiotis (view profile)
Hi Christopher ! Thanks for this. I am trying to use the function but the outliers do not show up. If I check the statistics object the outliers are empty and I looked at the source and I cant find where the outliers are calculated. The outliers show only if I set the ShowScatter option to true, along with all the other data points. (The limit is set to 1.5IQR). Thanks !
Christopher Hummersone (view profile)
Use the 'limit' property to specify the whisker extent.
Aditya Nanda (view profile)
Thanks Christopher, that explains it. Theres another issue. The whiskers are supposed to represent the maximum and minimum data. But on manually checking the values, I found that, for some cases, the whiskers were less than the maximum data or more than the minimum data. How is this possible ?
Christopher Hummersone (view profile)
Hi Aditya. Thanks for the feedback. The reason the means were not what you were expecting was because boxPlot did not calculate weighted means. I've since modified it so that it does; the fix is live on GitHub. Cheers.
Aditya Nanda (view profile)
I figured out how to plot the mean. So that is all set. But the mean value does not match what I have on record.
I am using quadrature points (like Gausshermite) so I have y_i and corresponding weights w_i. I calcultate the mean as \sum_ y_i w_i and it does not match the mean plotted by the BoxPlot. How is this possible?
Aditya Nanda (view profile)
Hi Christopher, Amazing work on this set of files. There is something I need help with. Now, the standard syntax for the Weighted Box plot (iosr.statistics.boxPlot(x,y,'weights',weights)) is to plot the median, the 25th percentile, the 75th percentile and the outliers.
I am interested in plotting the mean( not the median) , and just the 25h percentile and the 75th percentile. How do I od this ? I tried changing the namevaluepairs in classdef (CaseInsensitiveProperties = true) boxPlot < iosr.statistics.statsPlot
but that did not help. The mean never shows up. I tried to change the medianColor to white so that its invisible but the median always shows up.
Christopher Hummersone (view profile)
Hi Roland. Thanks for the feedback and suggestion. I've implemented your suggestion as an optional first argument in the constructor (keeping it consistent with other Matlab functions). The change has been committed to the GitHub repo and should be pulled on to the FX within 24 hours. Thanks!
Roland (view profile)
Hi Christopher,
great work. your box plot looks much better than the official version of the statistics toolbox. however, i have a kindly feature request. can you provide an optional property in the constructor method for feeding an axis handle from outside? or is there already any workaround to come up with an "own" axis handle?
best, roland
Christopher Hummersone (view profile)
Hi Arnold,
It took me a little while to even show the legend title (for me, the 'visible' property of the legend title object was 'off' by default).
Anyway, I think this speaks to known problems with the legend title in HG2. See: http://undocumentedmatlab.com/blog/plotlegendtitle. So I have no other ideas, or any fixes, I'm afraid.
Chris
arnold (view profile)
Hi Christopher,
I had another look at the legend. Since Matlab had finally given the possibility to set a title to a legend I tried to do just that but the title is always off (position wise), in the lower left corner of the entirety of the legend (at least I found it to be reproducible where it is situated).
Do you have any idea why that would be?
I found the Matlab ability to add a title to the figure quite useful and robust so I'm wondering what's gong on here.
kind regards
Arnold
cai onion (view profile)
Hi Chris,
Thanks for your suggestion. I have revised the script "boxPlot", but another errors were gotten, as followings:
Undefined function 'histcounts' for input arguments of type 'double'.
Error in iosr.statistics.boxPlot/xOffset (line 2328)
[N,~,bin] = histcounts(y); % create a histogram
Error in iosr.statistics.boxPlot/drawOutliers (line 1482)
xScatter = X + (0.8.*halfboxwidth.*obj.xOffset(obj.statistics.outliers{subidx{:}}));
Error in iosr.statistics.boxPlot/draw (line 1299)
obj.drawOutliers(subidx);
Error in iosr.statistics.boxPlot (line 658)
obj.draw('all');
Error in Boxplot4Index (line 23)
h11 = iosr.statistics.boxPlot(NewCC,...
It seemed that this tool was not suitable for the older version of Matlab.
Christopher Hummersone (view profile)
Hi Cai,
Unfortunately the matlab.mixin.SetGet class was introduced in R2014b. To make boxPlot work on earlier versions, try modifying line 1, replacing "matlab.mixin.SetGet" with "handle". boxPlot should work OK, but you won't be able to use set(...) and get(...) syntaxes to change boxPlot properties.
Chris
cai onion (view profile)
Dear Christopher,
I tried to use this tool to make a boxplot figure, but it did not work. The error was listed as followings:
Error using iosr.statistics.boxPlot
The specified superclass 'matlab.mixin.SetGet' contains a parse error or cannot be found on MATLAB's search
path, possibly shadowed by another file with the same name.
Error in Boxplot4Index (line 23)
iosr.statistics.boxPlot({'A','B','C'},NewCC,...
I did not know how to solve it. I was using a win7 and Version 2013a. However this script could be run successfully in win 7 and matlab 2015b. Thanks.
Christopher Hummersone (view profile)
No problem at all (sorry if I sounded defensive).
When boxPlot was a function, it was called box_plot. I renamed it to boxPlot because I reimplemented it as a class, and the two coexisted locally for a short time. Still, in every other context Matlab correctly distinguishes between "boxplot" and "boxPlot", just not with the keyboard/mouse shortcuts.
arnold (view profile)
thanks Chris,
my lack of knowledge and usage on the notches caused that question, now looking at it I get it, it just looked like a glitch in the way the boxes were being set up.
as for the aliasing, of course I did not mean to criticize, just state my surprise as ever since HG2 I didn't consider this could cause a problem anymore. Using a tool like export_fig negates such things anyways.
As of the toolboxissue. It's ok. When the reinstall didn't help I thought I'm just too stupid to get it. Did you not at one point change the name from box_plot to boxPlot? I guess having a unique name could remedy the problem but I could be wrong.
Christopher Hummersone (view profile)
Hi Arnold,
Notch:
The strange shapes you see are because the notch extends beyond the IQR (boxPlot should warn you about this). The notch is analogous to a confidence interval; its height is ±(1.58*IQR)/sqrt(N). So it will be generally be large if you have a small sample size (you can see a similar effect here: https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_boxplot_sect012.htm  scroll down to "NOTCHES"). So this behaviour is expected.
Aliasing:
This is something I have no control over; it's determined by whatever functions/drivers you use to create the image file. Do you use export_fig (http://www.mathworks.com/matlabcentral/fileexchange/23629exportfig)? It's a fantastic tool. Among its MANY options is control over the degree of antialiasing.
Toolbox:
I understand your problem. I typed 'import iosr.statistics.*' and F1 (etc) on 'boxPlot' no longer gives help related to iosr.statistics.boxPlot. Yet the 'which' function and other runtime name resolution operations run as expected. I'm afraid this is a shortcoming in Matlab. Perhaps consider filing a bug report? But there is nothing I can do about it I'm afraid.
arnold (view profile)
I see (the problem with legend and hatchfill2). I didn't think it through since I almost always use legendflex.
Notch:
I never used it but it doesn't seem to work for me. I see oddly shaped boxes, I'll send you a screenshot.
Also, I get heavy aliasing at the angled box edges, something I thought Mathworks prevented with their HG2 graphics system.
Toolbox
I have to ask again about the toolbox construct. I reinstalled matlab yesterday just to fix this but it didn't work. I'm on 2016a as you but hitting F1 or CTRL+D to get help or open boxPlot opens the original Mathworks function for me in either case (having installed iosr+ or having imported it at session start).
Christopher Hummersone (view profile)
Hi Arnold.
I've been thinking about this, but I can't think of a good interface without adding every hatchfill2 option to boxPlot. But it is easy enough to use with boxPlot, e.g.:
import iosr.statistics.boxPlot;
h = boxPlot(rand(100,3,3));
hatchfill2(h.handles.box(:,:,1),'single');
will add a hatch to the first box in each xgroup.
Of course this won't update the legend, which is a known shortcoming of hatchfill2. There is a discussion about hatchfill2 and legends on the hatchfill2 FX page.
Chris
arnold (view profile)
Hi Chris,
to continue ideas.... implementing hatchfill2 might readability of the diagrams.
http://www.mathworks.com/matlabcentral/fileexchange/53593hatchfill2
I use it quite often for publications/diagrams but haven't tried using it with boxPlot.
I'll never understand why matlab does not have native hatch support.
As far as I know at least legendflex supports it in legends.
arnold (view profile)
Hi Chris,
I see, I somehow overlooked that option, thanks! I'm just too unfamiliar with matlab toolboxes and how to handle them properly otherwise I might have picked up on it myself.
Thanks for the kind words, I'd like to think I gave some useful hints/feedback. Starting at an early point there has been no reason whatsoever anymore to go and use the integrated boxplot function, it is just too limited and cumbersome compared to yours. This sadly applies to a lot of the data visualization features like no proper XY errorbars etc. shame, but for most decent plots (besides boxplots now) one still has to go and copy all data to something like Origin instead.
If I were Mathworks, I'd ask you to have this integrated.
later
Arnold
Christopher Hummersone (view profile)
Hi Arnold,
I've already had this debate with another user: https://github.com/IoSRSurrey/MatlabToolbox/issues/1. As I mentioned in the discussion, 'Typing "import iosr.statistics.*" will prevent you from having to rewrite any code.'
As it mentions in the readme "Basic installation only requires you to add the install directory to the Matlab search path." So you don't need to run iosr.install, just add the install path to your Matlab path. The rest of the tools in the package might not be relevant to you, but they are to myself and colleagues. Since they're only text files, you can ignore them; they take up very little space on your hard disk.
Getting function help (and opening the function for editing) works as normal for me, but I am using R2016a, so I'm afraid I can't help if you're using an earlier version. Typing "help iosr" should present you with a full list of the toolbox contents. In that list, each file should have a clickable link that displays help for that file.
Best,
Chris
P.S. Getting "Pick of the Week" was great. But many of the features of boxPlot were your ideas, so huge thanks for that.
arnold (view profile)
Dear Christopher,
I was offline for quite a while but saw this made it to 'picks of the week', very nice. Alternative boxplot has come a long way.
One thing that confuses me a bit is why you have packed so many different tools into one toolbox. I understand you have put in a lot of work into all these other tools but I don't see how they are related for most users. The installer created another folder called 'SOFA_API'... I have no interest in installing that when I am looking for the 'alternative boxlplot'
Bottom line, a ton of stuff that people might not want seems to be added even to the path. I'm not sure everybody things this is more tidy than before. iosr.statistics.boxplot is not more readable that box_plot, which it was some version ago.
One thing I usually use all the time in the editor is mark a function name and use CTRL+D or F1 to either open the function or get the help in order to figure out syntax questions etc. Now, with the function being iosr.statistics.boxplot one can't do that anymore. The only way I see now is more cumbersome, manually digging through the folders and finding that function again.
Is there an easier way to open the function and/or help that I am unaware of?
For now I'll stick with the older version because I don't want to change syntax everywhere just now. Maybe I can figure out an easier way to strip the toolbox of everything but the boxplot, quantile2 and tab2box functions.
Christopher Hummersone (view profile)
Many thanks, Sean. I've uploaded a fix to GitHub.
Sean de Wolski (view profile)
You have a bug on line 842
assert(isnumeric(obj.addPrctilesTxtSize) && isscalar(obj.addPrctilesTxtSize), '''ADDPRCTILESTXTSIZE'' should be a numeric scalar')
It's checking the property and not the new setting: val.
Christopher Hummersone (view profile)
Hi again Arnold. I think I found and fixed your bug.
I've also been trying, on and off, to implement your sorting suggesting. But after a number of attempts, I've decided that it's too difficult to implement, because the input can have an arbitrary number of dimensions, each of an unknown size.
If you, or anyone else, wants to have a go, feel free to fork the repo!
Warwick (view profile)
Christopher,
I got it to work properly. Probably I had a glitch in my set paths.
Great contribution. Thanks
Christopher Hummersone (view profile)
Hi Warwick.
I also have Matlab's boxplot function, and use this version with no problems. Note that Matlab class/function calls are casesensitive, so you should be able to call this class as 'boxPlot' with no problems. If you can't, then I can only assume that there is an issue with your path (or it may be an OS thing, as I'm using a Mac). If you still can't get it to work, could you please post the error text, and perhaps a minimal working example? Thanks.
Chris
Warwick (view profile)
Christopher
This looks pretty useful. I like the option to have whiskers at [5,95] percentiles, for example, rather than 1.5*IQR which is not applicable to skewed data. However, because I already have Matlab's boxplot (no caps), your boxPlot is not recognised when I call it even though I have put it into a 'set path'. Or, at least, Matlab by default chooses its own boxplot function. Is there a workaround to this? How would I rename it to boxPlotCH, say? I'm using a Mac and Version 2016a if that is important.
Christopher Hummersone (view profile)
Hi Arnold,
I'm afraid I've been unable to recreate the error. Can you please download the latest version, and post a minimal working example that reproduces the error?
Thanks.
arnold (view profile)
Hi Christopher,
I was playing with the new features when I discovered a bug I can't really make out. It might have something to do with the overhaul of the handles you did? When using samplesize 'true' this is the result:
Reference to nonexistent field 'groupsTxt'.
Error in boxPlot/drawGlobalGraphics (line 1672)
set(obj.handles.groupsTxt,'FontSize',obj.groupLabelFontSize);
Error in boxPlot/draw (line 1322)
obj.drawGlobalGraphics();
Error in boxPlot (line 656)
obj.draw('all');
I'm on the current version 2016a, don't have an installation of 2015 up and running to cross check it there.
Christopher Hummersone (view profile)
Hi Royi,
3) The code is now on GitHub.
1) I've added an issue on to the GitHub repo for the percentile enhancement.
2) I have already implemented a 'showOutlier' option.
Thanks!
Royi Avital (view profile)
Hi Christopher,
I wrote my response in context of my previous comment and your reply:
1. It would be great to have a marker for the 5% and 9% Percentile. Juts like there is the median, the mean, and 25% / 75%, add another label for 5% / 95% (Maybe optional by default).
2. I think the outliers and the scatter should (And underneath are) 2 separate scatter series. What I would like to have is the opacity and visibility property of those 2 be exposed using the methods of your function. Something like hBoxPlot.scatterVisible = false(); hBoxPlot.outlierVisible = true();.
3. Putting your code on GitHub is a great idea in my opinion.
Thank You.
Christopher Hummersone (view profile)
Hi Royi,
I'm not sure what you mean. Can you post a link to an example?
Thanks,
Chris
Royi Avital (view profile)
Hi Christophe,
1. I think you should add new queue for 5% and 95%. It can be a Star or any other marker, just to be able to put the 5% and 9% point on the graph (In addition to what we have now).
2. I think you just need to make the outliers and the data have the property "Visible" exposed in your methods, that would be perfect and easy to do on your hand (Also the opacity).
3. Putting it on GitHub is a great idea!
Thank You!
Christopher Hummersone (view profile)
Hi Arnold. Perhaps I should put the file on Github and let you (or others) tinker ;)
I'll put the weighting and sorting options on my [very long] todo list. Do you have any useful references on implementing a weighting algorithm?
arnold (view profile)
Hi Christopher,
I just now had another idea which should be easy to implement yet very useful visually:
An option to sort the groups so that the boxes within one xgroup are ascending/descending.
Of course the similar functionality for the xgroups would also make sense.
This should make it a lot easier to read the data, especially if there are a lot of groups
This contribution is coming along very nicely due to your tireless commitment. Well done!
You should probably set up a donation link. :)
arnold (view profile)
exactly,
I know it is not very common in most plotting tools but it is the scientifically 'more correct' approach if one has measurement errors. Obviously people need to know what they're doing with errors and PDFs anyways for things to be 'correct' but take a guess how many people use boxplots for 3 datapoints ^^.
As I said, most scientist then go and do scatter plots with one point each + error bars (either all data points or just replace all by one using proper error propagation).
R does indeed have it. Matlab is quite cumbersome with all of that since no total least square for x errors is supported i.e.
Anyways, back on target:
Depending on how much effort you want to put into it, there are several approaches. I would start with the first and easiest:
Stick to normally distributed errors. As you suggested, the user has the option to give an extra array/vector of 'errors', one for each value or datum as you call it. Usually for most that'll be the standard deviation of the measurement i.e.. You can then calculate the weights (1./sigma.^2) for each group and using this get a weighted mean and confidence interval based on those errors.
Out of my head I have no clue about a a weighted median though it seems defined on wikipedia i.e.
I think it would go along nicely with box_plot but I could also see it as a standalone contribution on the fileexchange, also using tab2box for grouping and then doing a proper weighted mean with (asymmetrical) error bars in x and y some time in the future maybe.
Again I think the smaller effort of including it here assuming normally distributed errors/stdevs would serve 90% of the demand.
It's really a shame that Mathworks don't seem to bother about these things :)
Christopher Hummersone (view profile)
Roy  thanks for your feedback. I'll upload a modified function soon. To be clear, do you mean that the box should be 5–95%, or the whiskers? Also, the documentation was poorly phrased. What I meant was that the scatter plot does not include the outliers (as you observe, they are plotted separately). I'm not particularly keen on adding an option that would make the plot misleading. A suitable alternative (to me) would be to set 'limit' to 'none'.
Arnold  that's an interesting idea. It's something I've not come across before, at least in Matlab, but I see R has one or two libraries for weighted box plots. So would the idea be that you specify an additional array, the same size as Y, that determines the weight associated with each datum?
arnold (view profile)
Hi Christopher,
what about adding functionality to display weighted boxplots  that would also be really useful and quite unique as no matlab function I know supports it. I could most definitely use it and I'm sure I'm not the only one.
As of now I always go and calculate weighted means and confidence intervals using the matlab fit function (or one by myself) and then usually plot as a scatter plot or errorbar.
regards
Arnold
Royi Avital (view profile)
Great submission.
Few notes:
1. Could you add the option for 5% and 95% bars (In addition to 25% and 75%)?
2. The documentation states that when the option to show scatter data is on the outliers will be removed though it doesn't happen. Could you just give a different "On / OFF" to each (Can be done manually using the handlers, yet better use the classes).
Thank You!
Christopher Hummersone (view profile)
@arnold I added the 'theme' property I mentioned. Hopefully
>> set(h,'theme','colorall')
(h is the boxPlot handle) is somewhere close to what you described...?
I know the interface is more complicated, because of the support for an arbitrary number of dimensions in Y. Here are a couple of tips.
1) You no longer need to precisely specify any colormaps; you can just specify a function handle, for example,
>> set(h,'boxColor',@parula)
This functionality is described towards the end of the help.
2) Getting and setting boxPlot properties is asymmetrical, especially for group properties. For example,
>> h.boxColor = 'none';
>> h.boxColor
ans =
'none'
'none'
'none'
So you could try capturing a property value, e.g.
>> bc = get(h,'boxColor');
modifying its value, and returning the modified value to the box plot, e.g.
set(h,'boxColor', bc);
arnold (view profile)
true... maybe using the same color but just darken is more intuitive.
I just don't seem to get the necessary input format for the colormap of the boxes/groups. How do I just assign a standard colormap like parula, hot, etc??
This matlab typical approach by giving it a matrix of RGB vectors doesn't work
'boxColor', parula(size(g{1},1))

ah, forget it. it's documented in the classdef but wasn't really clear in the box_plot function.
I find it a bit cumbersome since I sometimes leave out the darkest color of parula for instance by choosing the colormap space to be slightly larger than I need it. parula(6) for 5 groups for instance and then use the lightest 5.
Christopher Hummersone (view profile)
@arnold would that not make the scatter plots difficult to see if the box is the same colour?! Or perhaps you mean the outliers? The trick with this is providing a consistent and easytouse interface. I'm thinking about adding a 'theme' option that automates various display options (especially colour). For example, one theme option could be 'colorboxes' which would give you something similar to the screenshot attached here; 'grayboxes' would do something similar but in grayscale; 'colorlines' would have unfilled boxes and change all of the line colours; etc. I'd happily add any suggested themes.
Fabian Schrumpf (view profile)
Very usefull and versatile function. Let's you adjust a lot of settings that the buildin function boxplot() doesn't give you access to. Thanks for sharing.
arnold (view profile)
have you thought of optionally colouring the scatter plots with the same colour as the according box instead of just gray?
Christopher Hummersone (view profile)
Thanks for the feedback, José. Actually showing the mean was something I was thinking about. It's an easy addition, so I've just uploaded a revised version. The outputs have been modified to include the mean data and a handle to the markers used to plot the means (see updated documentation).
José Ignacio Orlando (view profile)
Awesome contribution!
Is is possible to plot the mean values in the same plot?
Anders (view profile)
Sven (view profile)
Christopher Hummersone (view profile)
Hi Arnold,
I did upgrade to R2015b, but I'm fairly certain the updated function doesn't require any functionality that's new to R2015b.
Unfortunately I had to change quite a bit about the function interface in order to support the new functionality, which does make some things harder to do than they were.
If data.groups has one column, then you can use:
[~,h] = box_plot(...);
legend(squeeze(h.boxes(:,1,:)),g{1},'location','best')
(g has as many cells as data.groups has columns.)
As for boxcolor, it and other parameters should be specified as a cell array of size GbyIbyJ (check out the help text for more info).
The interface is more cumbersome now, I accept that, so I'd appreciate any insight you can offer that would make it less cumbersome (whilst retaining the flexibility of allowing data.groups to be of arbitrary size). At the moment I'm thinking of further developing the function into a class and providing methods to handle things like legends and plotting parameters.
Chris
arnold (view profile)
did you change matlab versions in between?
I could upgrade to 2015b but haven't gotten around to it yet. (I'm still at 2014b)
arnold (view profile)
Hi,
I'm having a couple of issues with the current version, coming from revision 381.
[y,x,g] = tab2box(data.x, data.y, data.groups);
now results in g being a cell array with everything in the first cell, i.e. a 5x1 list. So the legend which used to work doesn't anymore:
legend(squeeze(hb(:,1,:)),g, 'location', 'best');
results in:
Error using legend (line 120)
Invalid argument. Type 'help legend' for more
information.
The GroupColoring of the boxes used to work like:
'boxColor', parula(size(g,1))
but also, because of g ending up as a cell structure with everything in the first cell, it doesn't anymore.
How to apply 'boxColor' now? Using 'boxcolor', parula(5) results in:
Error using box_plot (line 318)
BOX_PLOT needs propertyName/propertyValue pairs
arnold (view profile)
I see you're still busy adding features and bugfixing, well, I can recommend two or three more things. Don't take it the wrong way, I'm just suggesting what I edit in my plots and what my experience with catchy scientific plotting tells me.
 background scatter plot (with jitter in x) behind each box: it's sometimes nice to see the actual data points too, many modern tools like Origin and JMP offer this option. If you just overlay a scatterplot (randomize xposition according to the width of the boxes for nicer look instead of strict xposition and set color according to box group) it achieves just that. I did it manually in some scripts now, and it looks great.
 An option to display the number of data points which make up the box on the top right (i.e.) of each box. the the prior suggestion, this helps to understand the quality of the statistics behind the boxes since many people ignorantly use boxplots with far too little data.
We've done this in our institute for years and I think it enhances the speed & quality of data interpretation.
 when the xgroups are nonscalar, vertical separator lines between the groups usually improve readability
 the ultimate boxplotting tool would be the combination of your contribution here with "hierarchical box plot" (also found on fileexchange).
... as I said, I don't say you have to include any of this ;)
Christopher Hummersone (view profile)
Thanks again for your feedback and suggestions, Arnold. At your suggestion I've added an xSpacing option, that I hope is satisfying. I've also fixed both functions to remove NaNs in x and corresponding y data. Again, I hope that fixes the problem. As for logarithmic spacing, there is undoubtedly a solution, but I think it will make things unnecessarily complicated. Instead, I suggest using the new xSpacing option. The former boxWidthMode evolved into the boxSpacing option, and would not have been helpful in this case.
arnold (view profile)
yes, that fixed the bug! Thank you!
Some more suggestions. I miss the option to use an X vector with numbers for labelling but NOT spacing the axis accordingly. When X contains strings it goes and just evenly spaces the categories. It would be nice if evenly spaced xticks/groups would be possible for numbered X vectors as well. Maybe you just go and introduce the option 'xSpacing', 'scalar' or 'even'
Regarding this, I also found another bug. When X contains numbers AND nan's (which happens to some of my datasets), line 390 in box_plot throws an error. Adding the line x(isnan(x)) = [ ] fixes it. I'm not sure if it's generally applicable.
A minor thing... When using scalar xaxisspacing with a numbered Xvector, the box widths get messed up when applying a logarithmic scale. You did have a 'boxwidthmode' which is no longer supported, did that have something to do with it?
I have no elegant solution for this in mind.
Thx very much!
Christopher Hummersone (view profile)
Ah. Of course the example works for me! There was a bug in quantile2 that I fixed a while back, but forgot to upload the file to the FX. Please download the most recent version:
http://uk.mathworks.com/matlabcentral/fileexchange/46555quantilecalculation
Hopefully that should help. I've also updated the box_plot documentation to explicitly mention tab2box.
arnold (view profile)
Hi Christopher,
Interesting function of yours (tab2box). Looks like that, combined with box_plot.m could be what I've been looking for ... but for me (R2014b) the example given by you in the file itself (line 3557) doesn't even work. Throws the same error as it does for my own data table:
Attempted to access x2(1); index out of bounds because numel(x2)=0.
Error in quantile2 (line 156)
q(m,n) = x2(1);
Error in box_plot (line 266)
Z.median = quantile2(y,.5,[],options.method); % median
Christopher Hummersone (view profile)
Hi Arnold,
Thanks for your feedback and comment. Just to make sure I understand correctly, you're suggesting that the Y input should facilitate plotting samples of different sizes? So the columns could be of different lengths? What if Y could be a cell array?
The tab2box function implicitly offers this functionality, since it will pad Y with NaN when samples are unequal in size. But this relies on the data being in tabular form initially.
arnold (view profile)
if you used structures, you could have different xvalues and sizes for each set. This would make a great addition, since Matlab and other submissions here haven't supported this forver.
or you could just introduce a group vector. X an ix1 vextor for the xpositions. Y an m x i matrix for the data and g a i x 1 vector for the groups....
arnold (view profile)
Hi Christopher,
great function, thx.
It would be great if you added the ability to use different sizes for the groups which would not have to be integrated into one 3d matrix.
Christopher Hummersone (view profile)
There's example code in the file help...
Juan Deaton (view profile)
Can you attach the code example for the figure you have at the top?
Christopher Hummersone (view profile)
@Alberto you mean the box colour? Use the 'boxColor' option and set it, for example, to [1 1 1; .5 .5 .5] (assuming you have two boxes per ytick). Setting parameters for each group is described towards the bottom of the help text.
Alberto (view profile)
Nice function, it works as promise. I miss the option , (or i don't know how to do it ) to fill the notch with a specified color like on the figure example.