Converting parallel CPU processing into GPU processing
11 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I am trying to convert code that ran in parallel on CPU cores into parallel processing on the gpu.
I would like to process matrices in a cell array on the GPU in parallel for how many cores are present on the gpu. However, it performs significantly slower than on a parallel CPU processor of 4 cores (25 cells processed in 30 minutes on 4 CPU cores, 5 cells is currently taking over 45 minutes to process on GPU and is still not finished). I'm very new to GPU computing and nothing seemed really obvious on how to speed this up.
GPU properties:
Data to be processed:
- series is a 568x1 cell array
- each cell is a 60x60 double (each entry is a value between -1 and 1)
Start processing
tic % test
for i = 1:5
cell_array{i} = gpuArray(cleanSeries{i});
end
Determine size of matrix within the first cell, equivalent to number of biological cells recorded
numCells = gpuArray(length(cell_array{1}));
Preallocate arrays for data
clust_mean = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
clust_std = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
clust_random_mean = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
clust_random_std = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
Initiate the processing
parfor cellNumber = 1:length(cell_array)
threshold_clust = gpuArray(NaN(numCells,100));
random_clust = gpuArray(NaN(numCells,100));
% process data over varying proportional thresholds starting at 25%
% strongest to fully connected (%100) at 25% steps i.e. 25%, 50%, 75%,
% 100%
for threshold = 25:25:100
threshold_matrix = (threshold_proportional(cell_array{cellNumber}, threshold/100)); % proportional threshold matrix - custom function
% clustering requires that all values be between 0 and 1 so remove
% any negatives
threshold_matrix(threshold_matrix < 0) = 0;
% ensure that randomizing the matrix is possible
[rowi,coli] = find(tril(threshold_matrix));
bothi = [rowi coli];
c = bothi(1,1);
d = bothi(1,2);
e=find(c==bothi);
f=find(d==bothi);
if length(e)==length(bothi)||length(f)==length(bothi)
disp(['One cell has all the connections, skipping ', int2str(threshold), '% threshold.'])
threshold_clust(:,threshold) = NaN(numCells,1);
random_clust(:,threshold) = NaN(numCells,1);
elseif length(bothi) <=3
threshold_clust(:,threshold) = NaN(numCells,1);
random_clust(:,threshold) = NaN(numCells,1);
else
% create random matrix - custom function
random_matrix = latmio_und(threshold_matrix,1000);
% clustering coefficient per matrix - custom function
threshold_clust(:,threshold) = clustering_coef_wu(threshold_matrix);
random_clust(:,threshold) = clustering_coef_wu(random_matrix);
end % if logic end
end % for loop end
% concatenate over thresholds
clust_mean(:,cellNumber) = mean(threshold_clust,2,'omitnan');
clust_std(:,cellNumber) = std(threshold_clust,0,2,'omitnan');
clust_random_mean(:,cellNumber) = mean(random_clust,2,'omitnan');
clust_random_std(:,cellNumber) = std(random_clust,0,2,'omitnan');
end % parfor loop end
gather(clust_mean);
gather(clust_std)
gather(clust_random_std);
gather(clust_random_mean);
toc
6 comentarios
Walter Roberson
el 12 de Mzo. de 2022
For operations other than pure copying, NaN has to go through a special "Abort" path in all calculations; calculations with it cannot stream the normal way. There also has to be special checking to see if the NaN is a "signalling NaN" as signalling NaN are required to raise exceptions whenever they occur.
inf cannot readily stream either... but I guess a bit more readily than NaN.
Ver también
Categorías
Más información sobre GPU Computing in MATLAB en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!