Calculation of spectral coefficients is almost always done over a particular number of samples (representing a pre-determined amount of time), and windowing is usually applied to adjust for edge effects. The coefficients are then calculated and stored for that "window" of data. Then if there are enough remaining datapoints, the window is moved forward, another section extracted, and so on. It is common for the window to move forward by half of the data size.
Then when you want to compare two sounds, you compare the windows of coefficients. If you want to compare longer stretches, look for consecutive windows to match on each side. For example if you had a "doo-wap!" sound you wanted to match, then if the window size happened to match the "doo" and the second window happened to match the "wap!" on your reference sound, it would not be good enough to match the "wap!" in the trial sounds, as you might happen to be matching the sound of adults talking in a Charlie Brown animated story ("wah-wah-wap!"). But you can optimize to some degree by matching the last window first and search backwards from the located areas to confirm the previous ones match, instead of matching every window in one to every window in the other.