Mathworks were very helpful with resolving this problem.
Any library with thread local storage uses a DTV slot when it is opened, of which there are only a small number. Libraries which use the initial-exec model require a slot and will fail to load if one is not available. Libraries using other model can load anyway even if there is no slot free. This is why there is an asymmetry (ie trying to call the openmp code after calling 'doc' in a fresh matlab fails, but if the openmp mex file is called first, 'doc' works fine).
So the simplest solution is to LD_PRELOAD any initial-exec libraries that may be required. In my case: LD_PRELOAD=/usr/lib/gcc/x86_64-redhat-linux/4.4.4/libgomp.so matlab Then these will all get a slot, and the things that would otherwise fill that slot won't be affected.
Another solution was to switch to the Intel compiler, so that the generated mex file can load the libiomp5 which is distributed with Matlab.