Fast subarray access when using GPU matrices
Mostrar comentarios más antiguos
I need to optimize my GPU code and the slowest line of my code is adding multiple subarrays to one large matrix
for ii = 1:Npos
large_array(ROI{ii,:}) = large_array(ROI{ii,:}) + smaller_array(:,:,ii);
end
Npos is around ~500 and large_array ~2000x2000, smaller is ~256x256, ROI are continuous subregions of large_array
do you have any idea how to write it faster and remove the for-loop ?
The main issue is the huge overhead when Im calling subsref many times.
Respuestas (1)
Edric Ellis
el 7 de Abr. de 2015
Editada: Edric Ellis
el 7 de Abr. de 2015
I think the best way to proceed is to concoct a single indexing expression that you can use with smaller_array to result in a single update
large_array = large_array + smaller_array(idx);
Obviously, the trick is calculating idx. This depends on the layout of the "pages" of smaller_array. If the pages are in the correct order in a column-major sense, here's how you could come up with "idx" for the case where large_array is 4-by-4 and smaller_array is 2-by-2-by-4:
idx_0 = reshape(1:4, 2, 2); % [1, 3; 2, 4]
idx_1 = repmat(idx_0, 2, 2); % 2-by-2 grid of [1,3;2,4]
idx_2 = 2 * 2 * kron(idx_0, ones(2,2));
idx = idx_1 + (idx_2 - (2*2));
which gives
idx =
1 3 9 11
2 4 10 12
5 7 13 15
6 8 14 16
Categorías
Más información sobre Logical en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!