knnsearch
Find nearest neighbors by edit distance
Syntax
Description
Examples
Find Nearest Words
Create an edit distance searcher.
vocabulary = ["Text" "Analytics" "Toolbox"]; eds = editDistanceSearcher(vocabulary,2);
Find the nearest words to "Test"
and "Analysis"
.
words = ["Test" "Analysis"]; idx = knnsearch(eds,words)
idx = 2×1
1
2
Get the words from the vocabulary using the returned indices.
nearestWords = eds.Vocabulary(idx)
nearestWords = 1x2 string
"Text" "Analytics"
Find Edit Distances to Nearest Words
Create an edit distance searcher.
vocabulary = ["MATLAB" "Text" "Analytics" "Toolbox"]; eds = editDistanceSearcher(vocabulary,2);
Find the nearest words and their edit distances to "Test"
and "Analysis"
.
words = ["Test" "Analysis"]; [idx,d] = knnsearch(eds,words)
idx = 2×1
2
3
d = 2×1
1
2
Get the words from the vocabulary using the returned indices.
nearestWords = eds.Vocabulary(idx)
nearestWords = 1x2 string
"Text" "Analytics"
Changing the word "Test"
to "Text"
requires one edit: a substitution. Changing the word "Analysis"
into "Analytics"
requires two edits: a substitution and an insertion.
Find Multiple Neighbors
Create an edit distance searcher.
vocabulary = ["MathWorks" "MATLAB" "Analytics"]; eds = editDistanceSearcher(vocabulary,5);
Find the two nearest words and their edit distances to "Math"
and "Analysis"
.
words = ["Math" "Analysis"]; idx = knnsearch(eds,words,'K',2)
idx = 2×2
1 2
3 NaN
View the two closest words to "Math"
.
idxMath = idx(1,:); newWords = eds.Vocabulary(idxMath)
newWords = 1x2 string
"MathWorks" "MATLAB"
There is only one word within the maximum edit distance from "Analysis"
, so the function returns NaN
for the other indices. View the nearest words with valid indices.
idxAnalysis = idx(2,:); idxAnalysis(isnan(idxAnalysis)) = []; newWords = eds.Vocabulary(idxAnalysis)
newWords = "Analytics"
Input Arguments
eds
— Edit distance searcher
editDistanceSearcher
Edit distance searcher, specified as an editDistanceSearcher
object.
words
— Input words
string vector | character vector | cell array of character vectors
Input words, specified as a string vector, character vector, or cell array of character vectors. If you specify words
as a character vector, then the function treats the argument as a single word.
Data Types: string
| char
| cell
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: knnsearch(eds,words,'K',3)
finds the nearest three neighbors
in eds
to the elements of words
.
K
— Number of nearest neighbors to find
1 (default) | positive integer
Number of nearest neighbors to find for each element in
words
, specified as a positive integer.
Example: 'K',3
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
IncludeTies
— Option to include neighbors whose distance values are equal
false
(default) | true
Option to return neighbors whose distance values are equal, specified as
true
or false
.
If 'IncludeTies'
is false
, then the function
returns the K neighbors with the shortest edit distance, where
K is the number of neighbors to find. In this case, the function
outputs N-by-K matrices, where N is
the number of input words. To specify K, use the
'K'
name-value pair argument.
If 'IncludeTies'
is true
, then the function also
returns the neighbors whose distances are equal to the Kth smallest
distance in the output. In this case, the function outputs cell arrays of size
N-by-1, where N is the number of input words. The
elements of the cell arrays are vectors with at least K elements. The
function sorts the neighbors in each vector in ascending order of distance.
Example: 'IncludeTies',true
Data Types: logical
Output Arguments
idx
— Indices of nearest neighbors in searcher
matrix | cell array of vectors
Indices of nearest neighbors in the searcher, returned as a matrix or a cell array of vectors.
If 'IncludeTies'
is false
, then the function
returns the K neighbors with the shortest edit distance, where
K is the number of neighbors to find. In this case, the function
outputs N-by-K matrices, where N is
the number of input words. To specify K, use the
'K'
name-value pair argument.
If 'IncludeTies'
is true
, then the function also
returns the neighbors whose distances are equal to the Kth smallest
distance in the output. In this case, the function outputs cell arrays of size
N-by-1, where N is the number of input words. The
elements of the cell arrays are vectors with at least K elements. The
function sorts the neighbors in each vector in ascending order of distance.
Data Types: double
| cell
d
— Edit distances to neighbors
matrix | cell array of vectors
Edit distances to neighbors, returned as a matrix or a cell array of vectors.
If 'IncludeTies'
is false
, then the function
returns the K neighbors with the shortest edit distance, where
K is the number of neighbors to find. In this case, the function
outputs N-by-K matrices, where N is
the number of input words. To specify K, use the
'K'
name-value pair argument.
If 'IncludeTies'
is true
, then the function also
returns the neighbors whose distances are equal to the Kth smallest
distance in the output. In this case, the function outputs cell arrays of size
N-by-1, where N is the number of input words. The
elements of the cell arrays are vectors with at least K elements. The
function sorts the neighbors in each vector in ascending order of distance.
Data Types: double
| cell
Version History
Introduced in R2019a
See Also
correctSpelling
| editDistance
| editDistanceSearcher
| rangesearch
| splitGraphemes
| tokenizedDocument
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)