When you have the profiler on (from MATLAB R2015a onwards), to get realistic timings the GPU is forced to run synchronously.
My guess is that without the profiler, your second call to bsxfun is executed before the first finishes (this is possible because it can compute it without needing the output from the previous loop). MATLAB attempts to allocate space for an output array of appropriate size, but there isn't enough space available for two arrays of that size.
With the profiler on, the space from the first call to bsxfun has been freed up and made available to the second bsxfun.
You can confirm this is the case by calling wait(gpuDevice) inside your loop to prevent this execution overlap.