You need to examine what the default parameters of the function return --
edges: [-0.47 -0.28 -0.19 -0.10 -0.01 0.08 0.17 0.26 0.36 0.45]
O: [50.00 211.00 833.00 1856.00 2133.00 1461.00 554.00 89.00 13.00]
E: [35.88 225.21 855.97 1818.58 2162.62 1439.93 536.40 111.61 13.80]
NB: there are only 6 DOF in the output statistic --
However, it does show what the normality plot shows to some degree, the LH tail is "heavy" with more observations at the lower extreme than expected. With the coarse binning, this is enough to reject the hypothesis at the default level of significance. (And, correspondingly, it is a little light on the RH end).
Let's look at how well the guess worked; one should have at least 5 observations in a bin(*)
>> [min(stats.O), max(stats.O);min(stats.E), max(stats.E)]
As the NIST handbook notes, one of the weaknesses of the Chi-Square is there is no optimal binning algorithm so the results can be sensitive to the choice made.
John d' has a very valid point that lots of data means can reject more easily...depending on the use of the data, it's probably such that these deviations from normality will not be very significant unless, of course, you're doing something like estimating from the tails in which a normal approximation will likely underestimate/overestimate the observed data frequency somewhat in the left/right tails, respectively.
I tend to rely upon the Shapiro-Wilk test which I don't believe TMW has implemented; I've a homebrew version I coded 40 years ago...