FFT slowdown even after workspace reset

I'm experiencing behavior with the fft() function that is causing me to have to restart Matlab between executions of a long script that is both processing and memory intensive and requires, among other things, millions of fft's on the CPU and GPU. If I run bench() prior to running the script, my computer (i9-13950HX w/64GB of ram, running Windows 11, Matlab R2024a) clocks in very fast. After I run my script, all performance metrics are basically identical except for fft() which clocks >10x slower than before.
No matter what I do to the workspace (clear all, clear classes, clear functions, close all hidden force, clc, reset(gpuDevice), etc.), or the fft planner I cannot bring the performance of fft() back to what it was before execution of the script.
Am I overlooking anything that could reset the performance of the fft short of restarting Matlab itself? I would like to let the computer loop over a bunch of datasets but right now the slowdown in the fft is making this very inefficient. I am currently considering calling the Matlab engine from Python so that I can restart it between script calls to prevent this. I am running Matlab on 2024a and may be able to update to 2024b but cannot upgrade past 2024b.

21 comentarios

dpb
dpb el 2 de Jun. de 2026 a las 17:13
I believe probably only Mathworks can address this. Contact official support request at <Product Support Page>
Paul
Paul el 2 de Jun. de 2026 a las 18:43
If the culprit is identified, would you mind posting back here as others may be interested in the findings.
Walter Roberson
Walter Roberson el 2 de Jun. de 2026 a las 19:31
I note that the i9-13950HX has 8 performance cores and 16 efficiency cores. I wonder if later in the run, most of the computation is being shunted to the efficiency cores?
Timothy
Timothy el 2 de Jun. de 2026 a las 22:19
@dpb, thanks I might give that a shot.
@Paul, I found the culprit function, which calls another function which splits a bunch of large complex double precision N x M arrays into three dimensional 32 x M x (N/32) arrays, computes the FFT in the column dimension, multiplies with the conjugate of another 32 x M x (N/32) array, inverse Fourier transforms & normalizes to create a bunch of cross-correlations. However, it seems that if I isolate this sub function and run it a bunch of times by itself it doesn't affect the performance of the fft() function. Still, if I delete the sub function from the larger function I also have no reduction in performance, so I know that it is tied to it somehow. If I can get more specific or create a simple toy function that creates the fft performance loss I'm seeing I will post it here.
@Walter Roberson: Maybe this is happening, I don't know how to monitor what cores are used. Matlab is the main process on my computer however, and this slowdown can be created in under 10 minutes of operations by iterating the function described above in a loop. If I call the bench in each iteration of the loop and store the FFT time I can watch it slow down each iteration starting with the 4th (I get about 25 iterations in 10 minutes, by which time the FFT score has gone from ~0.15 seconds to ~1 second, and keeps slowing down the more times I call the function).
Paul
Paul hace alrededor de 3 horas
Editada: Paul hace alrededor de 2 horas
fft used to have a memory leak, but that was fixed. Maybe a similar, yet different, issue has reared up since then. Matlab 2020a/b fft function memory leak - MATLAB Answers - MATLAB Central
Also, this thread How can I solve memory leak in fft? - MATLAB Answers - MATLAB Central, which isn't really about a memory leak, discusses memory management with fft and seems like it might be on point based on the problem description. Maybe the
fftw(wisdom,[])
command is worth a try. Though it sounds like all of the FFTs are 32-point, so maybe this isn't the issue.
@Paul I think you mean
fftw('wisdom',[])
dpb
dpb hace alrededor de 1 hora
Editada: dpb hace alrededor de 1 hora
The doc for <fftw> is may be a little confusing for that case -- it does show the form as @Paul used, but uses wisdom as a place holder for either 'swisdom' or 'dwisdom'. 'wisdom' alone isn't documented but doesn't error on local system...but it does need to be a character string (or a variable that would contain the string).
Paul
Paul hace 28 minutos
Yes, as dpb suggested I was using wisdom as a variable that would have the value ‘dwisdom’ or ‘swisdom’ as appropriate.
dpb
dpb hace alrededor de 2 horas
Was going to comment that was good spelunking @Paul to find the thread and refer to fftw; something like that was what I had in mind that would get reset on restart but wasn't affected by normal memory clearing, etc., ...back in days of yore before MATLAB and had to use the libraries directly in FORTRAN (before Fortran days, too) I knew about fftw but it had completely slipped my mind in the ensing 40 years. Of course, that also predated having multi-cores, GPUs, parallel computing TBs so one had far more direct knowledge of what was going on inside.
Timothy
Timothy hace alrededor de 1 hora
@Paul, @dpb, @Walter Roberson I'm sorry I wasn't specific enough in my original post, I have definitely tried reseting the wisdom in fftw for both single and double precision and it did not affect anything.
You indicate you found the "culprit function". After running that, is it FFT calls in isolation that slow down, or is it subsequent runs of the culprit function as a whole?
You also stated "large complex double precision N x M arrays into three dimensional 32 x M x (N/32) arrays" -- how large is "large" in this context? What are typical values for N and M for the data on which you're operating?
I think without seeing that culprit function it's likely going to be difficult to determine what's going on. Please send it to Technical Support so they can work with the developers to understand the problem and try to determine the root cause of the slowdown.
dpb
dpb hace alrededor de 5 horas
Editada: dpb hace alrededor de 4 horas
I figured from the git-go this would take the developers being able to poke at the innards.
Besides the isolation of the given function, that it is something else being done to the state of the GPU on a restart before recovers performance is curious...
First of all will whether it is reproducible on a Mathworks machine or is something unique to @Timothy's particular system. Not too likely, probably, but ya' never know.
Timothy
Timothy hace alrededor de 4 horas
@Steven Lord, the arrays are ~6000 x 16000 complex double precision matrices. Happy to contact tech support but I posted here to check if I'm missing something obvious which is frequently the case. I'm trying to drill down a bit further to see if I can reproduce the problem with a simpler script before I contact tech support.
dpb
dpb hace alrededor de 4 horas
It might be interesting/useful to see if the symptom were to go away for some smaller size?
I'd suggest if were able to create such a sample case to go ahead and post it here -- those who do have the TB and could run it (I don't) could also see if it is reproducible on other systems.
Steven Lord
Steven Lord hace alrededor de 3 horas
After running your culprit function is it FFTs on the CPU that are slow, FFTs on the GPU, or both?
Timothy
Timothy hace alrededor de 6 horas
Editada: Timothy hace alrededor de 6 horas
@Steven Lord CPU, at least, the GPU hasn't been touched yet when I can generate the problem. Here is a script that reproduces part of the problem. The crazy thing is, I was wrong about the FFT calls being a part of the problem. I can delete all of those cross-correlations and still get a slowdown for fft. The example script below is an example:
out = F;
function [out] = F()
for n = 1:10
NN = 500;
MM = 500;
C = cell(MM, NN);
for nn = 1:NN
for mm = 1:MM
C{mm, nn} = randn(21, 21);
end
end
out{n} = C;
disp(n);
end
end
If I run this mini-script and call bench() or just tst = randn(1, 2^25); tic; fft(tst); toc (note that I actually execute: tic; fft(tst); toc, multiple times to get an average and let the planner optimize), I get a slow down of about 2X. On one machine, the fft speed goes from ~0.15 seconds to ~0.3 seconds. If I clear the workspace in this case the fft speed goes back to normal, e.g. ~0.15 seconds. However, if I re-run the mini script above and then re-run tst = randn(1, 2^25); tic; fft(tst); toc (without clearing the workspace) instead of being ~0.3 seconds, now execution of the fft takes ~0.65 seconds. If I clear the workspace, I'm back to ~0.15 seconds. If I run it a third time, now execution of the fft takes ~0.78 seconds (for the five last executions, as I'm writing this, toc registered 0.775059, 0.775901, 0.779967, 0.772605). So something odd with the fft time seems to be happening (tested on R2024a and R2024b, different computers, slightly different results, the 2024b computer has a slowdown of ~0.32, ~0.48, ~0.52, ~0.63 as a I clear the work space and execute the miniscript above between speed tests).
The behavior I am having reproducing from my other script, which is doing a lot more, is the persistence of the slowdown. In my other script, the slowdown of the fft persists even after workspace clearing. I will reach out to tech support.
Timothy
Timothy hace alrededor de 5 horas
The slowdown can be observed more easily using the following code:
for n = 1:5
clear out
tst = randn(1, 2^25);
FF = @()fft(tst);
T1 = timeit(FF);
out = F;
tst = randn(1, 2^25);
FF = @()fft(tst);
T2 = timeit(FF);
disp(['Cleared workspace time: ', num2str(T1)]);
disp(['Uncleared workspace time: ', num2str(T2)]);
drawnow;
end
function [out] = F()
out = cell(1, 10);
for n = 1:15
NN = 500;
MM = 500;
C = cell(MM, NN);
for nn = 1:NN
for mm = 1:MM
C{mm, nn} = randn(21, 21);
end
end
out{n} = C;
end
end
My output was:
Cleared workspace time: 0.17438
Uncleared workspace time: 0.40605
Cleared workspace time: 0.17214
Uncleared workspace time: 0.9722
Cleared workspace time: 0.17484
Uncleared workspace time: 1.8431
Cleared workspace time: 0.17464
Uncleared workspace time: 1.8422
Cleared workspace time: 0.17555
Uncleared workspace time: 3.3422
on the machine I'm currently at. Note this doesn't reproduce the persistence (despite clearing the workspace) that I'm observing elsewhere, but I don't know if that persistence is necessary to cause the performance drop I'm seeing in my original code.
Walter Roberson
Walter Roberson hace alrededor de 4 horas
Data point: the problem does NOT occur on my Mac Tahoe 26.5 Intel I9-10910 (10 cores @3.6 GHz, no efficiency cores) when running MATLAB R2024a, or R2025b.
R2024a result:
Cleared workspace time: 0.23919
Uncleared workspace time: 0.23026
Cleared workspace time: 0.22924
Uncleared workspace time: 0.22899
Cleared workspace time: 0.22922
Uncleared workspace time: 0.22285
Cleared workspace time: 0.23152
Uncleared workspace time: 0.22992
Cleared workspace time: 0.22808
Uncleared workspace time: 0.23232
Timothy
Timothy hace alrededor de 4 horas
@Walter Roberson GTK thanks!
Paul
Paul hace alrededor de 3 horas
How long does it take that code to run in wall clock time?
Does it matter if out is preallocated as
out = cell(1,15)
to be consistent with the loop over n = 1:15?
Timothy
Timothy hace alrededor de 3 horas
@Paul that was an artifact of me testing different loop lengths, thanks for catching that but it doesn't affect things much. Total run times is ~3 minutes 30 seconds, and with a consistent cell size I got the following output:
Cleared workspace time: 0.1707
Uncleared workspace time: 0.43045
Cleared workspace time: 0.17297
Uncleared workspace time: 1.0535
Cleared workspace time: 0.17199
Uncleared workspace time: 1.687
Cleared workspace time: 0.16579
Uncleared workspace time: 1.908
Cleared workspace time: 0.17425
Uncleared workspace time: 2.4588

Iniciar sesión para comentar.

Respuestas (0)

Productos

Versión

R2024a

Etiquetas

Preguntada:

el 2 de Jun. de 2026 a las 17:02

Comentada:

hace alrededor de 3 horas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by