MATLAB's inefficient copy-on-write implementation
Mostrar comentarios más antiguos
MATLAB's copy-on-write memory management seems to have a serious defect, which I think is the reason behind the abysmal performance of subsasgn overloading. (The same problem probably occurs with parenAssign in the new R2021b RedefinesParen class -- I haven't yet experimented with it.) Normally, an array assignment like b = a simply does a pointer copy; the array data is not copied until b is modified (e.g. b(1) = 1). Thereafter, subsequent modification of b (e.g. b(2) = 1) do not copy the full array; they just modify it in place as long as the reference count is 1. For example,
clear, a = zeros(1e8,1);
memory % 2764 MB used by MATLAB
b = a;
memory % 2764 MB
tic, b(1) = 1; toc, memory % 0.329099 seconds, 3540 MB
tic, b(2) = 1; toc, memory % 0.000123 seconds, 3541 MB
However, the benefit of copy-on-write is lost when the variable is changed in a function, e.g.
% test.m
function x = test(x)
x(1) = 1;
In this case, the x reference count is apparently incremented in test before the assignment is made, so this will always result in a full array copy. For example,
clear, a = zeros(1e8,1);
tic, a = test(a); toc % 0.337475 seconds
tic, a = test(a); toc % 0.310373 seconds
To see what's happening with copy-on-write, test.m is modified as follows:
function x = test(x)
memory
x(1) = 1;
memory
return
The array modification inside the function forces a full array copy, even though the original array is immediately discarded:
clear, a = zeros(1e8,1);
memory % 2748 MB
a = test(a); % 2748 MB, 3503 MB
memory % 2740 MB
I would think this problem could be easily avoided by treating any variable that appears as both an input and output argument in a function (e.g. function x = test(x)) as a reference variable, i.e. its reference count is not incremented on entering the function and is not decremented upon exiting. If the function is called with different input and output arguments, e.g. y = test(x), then the interpreter would implement this as y = x; y = test(y).
Is there any particular reason why MATLAB does not or cannot do this? There are many applications such as subasgn overloading that could see a big performance boost if this problem is fixed.
1 comentario
James Tursa
el 31 de En. de 2022
Slight point of confusing terms with your description. In the past, MATLAB has passed shared data copies of arguments to functions, not bumping up reference counts. Do you have evidence or know of documentation that shows a change in this behavior, and that now a bumped up reference count method is used for arguments? Why do you write that MATLAB uses this method?
Respuesta aceptada
Más respuestas (1)
(1) The variable must be allocated within a function.
A workaround to this rule is to wrap the data in a handle object:
a = 1:1e8;
tic,
obj=refwrap(a); clear a
testFn(obj);
a=obj.data;
toc %Elapsed time is 0.000460 seconds.
function testFn(obj)
obj.data(1) = 1;
end
Categorías
Más información sobre Construct and Work with Object Arrays en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!