Hi Jungmoon. The [48,48] and [128,128] indicate that the 96 channels and 256 filters are split up into two groups. This is part of the implementation of AlexNet. Each set of filters are applied to a different group. As of now, there is no way of replicating this behavior using the "convolution2dLayer" function in the Neural Network Toolbox.
You should be fine specifying 96 and 256 for NumChannels and NumFilters. I believe the original intention of this behavior was just to make it easier to split the computation between GPUs. Functionally, I believe your current setup should behave similarly.