parfor results core dump

MATLAB R2015b crashes on 64 bit Linux. I managed to track the cause of failure to this chunk of code
values = cell(1, nElement);
parfor i = 1:nElement
values{i} = element(i).stiffnessMatrix;
end
where element is an object array and I call the stiffnessMatrix method on all of the objects. In practice nElement is very big. The segfault comes from libtbbmalloc.so.2 (now I know that parfor wraps TBB).
Reading on forums, I preloaded the problematic libraries:
export LD_PRELOAD=/usr/local/MATLAB/R2015b/bin/glnxa64/libtbb.so.2:/usr/local/MATLAB/R2015b/bin/glnxa64/libtbbmalloc.so.2
In vain. The same error persists. It seem that TBB cannot allocate memory (for smaller problem, parfor works). Why does it work for the simple for loop and not for the parfor loop? Just before the program crashed, I saw that I have 11 GB of free memory (more than 4 times more than used by the whole operating system).
I tested it on Windows, and it still fails with the same error. Then I tried with a simple for loop, not a parfor loop on both Linux and Windows. On Linux, the error persists but on Windows the for loop version works.
Thanks in advance.

10 comentarios

OCDER
OCDER el 16 de Nov. de 2017
If this works with a regular for loop, but not parfor, it's likely a concurrency issue. If element is an object and multiple workers are trying to split, modify, and reassemble an object, then maybe it crashes.
What is the object array element (what is the code that defines this object, or what does it store?)
What does element(i).stiffnessMatrix do to all these objects?
Do you have a simple full set of code for us to test that for loop?
Zoltán Csáti
Zoltán Csáti el 17 de Nov. de 2017
Thanks for the reply. For several days I have been trying to locate where the error comes from. Here is what I experienced:
1) The code fails at random locations but always in loops (let them be for loops or parfor loops)
2) I use third-party mex functions but they do their job correctly and I never get error messages after invoking them.
3) The segmentation fault only occurs for larger problem (i.e. when larger object arrays are allocated. Being "large problem" is relative: on Windows it allows more, on Linux it fails earlier.
Since the code is quite large (contains several dozens of functions), I didn't attach it. Unfortunately, I couldn't produce a minimum working example because, as I wrote, the code fails at different locations.
OCDER
OCDER el 17 de Nov. de 2017
I would suspect the 3rd party MEX functions has a bug that manifests only when processing large data. You may have to debug the MEX source code directly as segmentation faults indicate a pointer error in the C code. Finding this bug would be VERY hard, especially since the code works sometime and then fails randomly, and for parfor and for loops...
You could try to pinpoint the inputs that cause the crash by using a regular for loop and saving variables after each successful loop. Once it crashes, restart from the prior loop and see if the next loop crashes again. If you can replicate the crash conditions, then debugging just might be possible.
Zoltán Csáti
Zoltán Csáti el 17 de Nov. de 2017
Could you please detail it?
You could try to pinpoint the inputs that cause the crash by using a regular for loop and saving variables after each successful loop. Once it crashes, restart from the prior loop and see if the next loop crashes again. If you can replicate the crash conditions, then debugging just might be possible.
Do you mean, for each for loop I should output the loop index so as to see until which index it succeeds?
OCDER
OCDER el 17 de Nov. de 2017
Yes, something like that. For instance:
values = cell(1, nElement);
for i = 1:nElement
save('temp.mat');
values{i} = element(i).stiffnessMatrix;
mexFunction2(...)
mexFunction3(...)
end
If it crashes always on the 79th iteration, then you can load 'temp.mat' and run the remain codes in the loop line-by-line until you find the mex function causing the crash.
Zoltán Csáti
Zoltán Csáti el 17 de Nov. de 2017
Thank you. But the problem is that when I call mexFunction2 in a loop it does not fail. Later, when I do not even call a mex function in a for loop, that's when the program crashes.
OCDER
OCDER el 17 de Nov. de 2017
Without using any mex functions, does the loop at least crash at the same iteration number?
Zoltán Csáti
Zoltán Csáti el 17 de Nov. de 2017
I cannot test it without the mex function because I need that for the later codes to work. And even later, the loop without the mex function crashes at different iteration numbers. And different parts of the code on Linux and on Windows.
OCDER
OCDER el 17 de Nov. de 2017
This does sound like one of the worst case scenario for debugging codes. You might have to contact the authors of the 3rd party MEX codes to find the memory allocation bug. Without looking at all the codes, it'll be hard to pinpoint the issue. This might help with debugging strategies:
Zoltán Csáti
Zoltán Csáti el 18 de Nov. de 2017
Thank you. I may use valgrind/callgrid to profile the mex file. If you write an answer I will accept it. Although my problem persists (it's too vague to be solved immediately), you gave me good advices.

Iniciar sesión para comentar.

Respuestas (0)

Categorías

Etiquetas

Preguntada:

el 7 de Nov. de 2017

Comentada:

el 18 de Nov. de 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by