When loading .mat files in a parfor, the first time is way slower than the second time.

6 visualizaciones (últimos 30 días)
Hi all,
I've encountered a weird behavior I wasn't able to understand or find a possible explanation of.
I wrote a function for loading some files (data structures whose size ranges from 40 to 100 MB) from a dataset in a parfor, and do some operations.
I've noticed that the first time I launch the script, the execution is incredibly slower than the successive executions (38 seconds vs 1.8 seconds).
I've tried to remove the parfor and use a simple for, but there is still a difference between the first and the successive times, even thou more limited (17 seconds vs 11 seconds).
I've also tried different datasets, and there is the same behavior. When I restart Matlab and I launch the same call the first time, same thing. If I stop and restart parpool, same thing.
I am wondering why it is like this and if I can do something to avoid this behavior.
Matlab 2019a Update 4, Unix (64-bit)
PS: parpool was already started.
PPS: the successive executions are faster even after calling clear all/clearvars.
PPPS: to remove all possible other influences, I've cleaned the code so that now it just loads files. Same behavior.
  2 comentarios
Francesco Onorati
Francesco Onorati el 15 de Nov. de 2019
Editada: Francesco Onorati el 15 de Nov. de 2019
no broadcasting variables in the loop. Just tried clear all and clearvars, but the successive executions are stil way faster. I load the same data: first time, slow; second (and successive) time(s), fast. If I change dataset, same thing: first time, slow; after, fast.

Iniciar sesión para comentar.

Respuestas (1)

Daniel M
Daniel M el 15 de Nov. de 2019
Editada: Daniel M el 15 de Nov. de 2019
So you're doing something like this?
for k = 1:10
mydata = load('myfile.mat');
output = someFunction(mydata);
end
That's pretty inefficient. You should load the data once outside the loop. It will be faster to read the data from a cache than to load it each time (because typically speed of memory is better than I/O).
As for why the first iteration is slower, I believe that is due to the JIT compiler doing its magic. This is also referred to as 'warm-up time'. Hopefully someone with a deeper understanding can weigh-in here.
Try running this script to test for warm up time. Note: run this in a script, not the command window (because the JIT effects may not take place in the command window).
clearvars
close all
clc
% Create some data, but only once
if ~exist('data.mat','file')
data = rand(1,1e8,'single');
save('data.mat','data');
clear data
end
fname = 'data.mat';
fprintf('loading\n')
tic
mydata = load(fname);
data = mydata.data;
loadtime = toc;
% display the loading time
fprintf('It took %f s to load the file.\n',loadtime)
% Run some stuff in a loop and time it.
iters = 20;
t2 = zeros(1,iters);
for k = 1:iters
t1 = tic;
% do some random processes on mydata
tmp1 = data.^2;
tmp2 = sin(tmp1);
t2(k) = toc(t1);
end
figure
stem(t2)
xlabel('Time')
ylabel('Iteration')
% first couple iterations take longer
% get the warm up time (from first few iterations)
warmtime = max(t2(1:3))/mean(t2(end-3:end)) - 1;
fprintf('First few iterations were %.0f %% slower than last\n',warmtime*100)
fprintf('done!\n')
And the output:
loading
It took 2.576633 s to load the file.
First few iterations were 58 % slower than last
done!
% see attached figure
  5 comentarios
Daniel M
Daniel M el 15 de Nov. de 2019
Editada: Daniel M el 15 de Nov. de 2019
Can you write a self sufficient test script please? That does not run, nor is it a test of parfor.

Iniciar sesión para comentar.

Categorías

Más información sobre Parallel for-Loops (parfor) en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by