How to vectorise a knnsearch loop to find nearest neighbours

3 visualizaciones (últimos 30 días)
BC
BC el 8 de Jul. de 2021
Respondida: Hornett el 18 de Sept. de 2024
I have a table storing coordinates. For each pair of coordinates (each row), I want to find the nearest neighbour distances to all other pairs of coordinates and store the mean for each pair, and an overall mean for the table, which I then save to a text file.
Currently the example code below works, but for my actual data it is nested in another loop, so it processes multiple tables each with multiple pairs of coordinates at a time, so it's pretty slow. Is there a way to vectorise this loop to improve the speed? (Any other criticisms of this code are welcome too).
% Create example table
images = ["Image_1";"Image_1";"Image_1"];
x = [27;100;80];
y = [40;145;190];
table_example = table(images,x,y);
% Loop to process the table
for k = 1:height(table_example) % for every row (coordinate) in the table, do the following:
current_coordinate = table_example{k,["x","y"]}; % extract coordinate of current loop iteration
table_example{k,["x","y"]} = [NaN]; % change current coordinate in original table to NaN, to avoid comparing to itself
k_neighbours = height(table_example)-1; % get number of neighbours to search for - height of table minus 1 coordinate
[idx,dist] = knnsearch(table_example{:,["x","y"]}, current_coordinate,"K", k_neighbours,"Distance","euclidean"); % get NN distances for current coordinate
mean_distances_per_pair(k,:) = mean(dist); % get mean distance for each pair
table_example{k,["x","y"]} = current_coordinate; % return current coordinate to table
end
overall_mean_distance = mean(mean_distances_per_pair); % get overall mean distances for the whole table once loop is finished
% Write mean distance to text file
% writematrix([mean_distances_per_pair],examplelocation1,"WriteMode","Append")
% writematrix([overall_mean_distance],examplelocation2,"WriteMode","Append")

Respuestas (1)

Hornett
Hornett el 18 de Sept. de 2024
Vectorizing the loop in your MATLAB code can significantly improve its performance, especially with larger datasets. Here's a way to achieve that by computing distances between all pairs of coordinates at once and then calculating the mean distances, without the need for explicit looping through each row:
% Create example table
images = ["Image_1"; "Image_1"; "Image_1"];
x = [27; 100; 80];
y = [40; 145; 190];
table_example = table(images, x, y);
% Extract coordinates from the table
coords = table_example{:, {"x", "y"}};
% Calculate the pairwise distance matrix
distMatrix = pdist2(coords, coords, 'euclidean');
% Set the diagonal to Inf to ignore self-distance
distMatrix(logical(eye(size(distMatrix)))) = Inf;
% Calculate the mean distance for each pair (excluding the distance to itself)
mean_distances_per_pair = mean(distMatrix, 2, 'omitnan');
% Calculate the overall mean distance (excluding Infs)
overall_mean_distance = mean(mean_distances_per_pair, 'omitnan');
% Optionally, write mean distances to a text file
% writematrix(mean_distances_per_pair, 'examplelocation1', 'WriteMode', 'Append');
% writematrix(overall_mean_distance, 'examplelocation2', 'WriteMode', 'Append');
Key Changes:
  • The use of pdist2 computes all pairwise distances in one step, eliminating the need for the loop. This is much faster for large datasets.
  • By setting the diagonal of the distance matrix to Inf, we effectively ignore the distance of each point to itself, which simplifies the calculation of mean distances.
Hope it helps!

Productos


Versión

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by