documentEmbedding

Document embedding model to map documents to vectors

Since R2024a

Description

A document embedding maps documents to real vectors.

The vectors attempt to capture the semantic content of the full document, so similar documents have similar vectors. The document can be a sentence, a paragraph, or a longer text.

Creation

Create a document embedding from a pretrained embedding using documentEmbedding.

Syntax

emb = documentEmbedding

emb = documentEmbedding(Model=modelName)

Description

emb = documentEmbedding returns a document embedding using the all-MiniLM-L6-v2 sentence transformers model.

This function requires Deep Learning Toolbox™.

example

emb = documentEmbedding(Model=modelName) returns the document embedding model specified by the Model name-value argument.

Input Arguments

expand all

`modelName` — Document embedding model
`"all-MiniLM-L6-v2"` (default) | `"all-MiniLM-L12-v2"`

Model name, specified as one of these values:

"all-MiniLM-L6-v2"— Sentence transformer model with six self-attention layers. This model outputs a 1-by-384 embedding vector. This option requires the Text Analytics Toolbox™ Model for all-MiniLM-L6-v2 Network support package.
"all-MiniLM-L12-v2"— Sentence transformer model with twelve self-attention layers. This model outputs a 1-by-384 embedding vector. This option requires the Text Analytics Toolbox Model for all-MiniLM-L12-v2 Network support package.

If the required support package is not installed, then the function provides a download link.

Object Functions

embed Map document to embedding vector

Examples

collapse all

Map Documents to Vectors

Open Live Script

Load the pretrained document embedding all-MiniLM-L6-v2 using the documentEmbedding function. This model requires the Text Analytics Toolbox™ Model for all-MiniLM-L6-v2 Network support package. If this support package is not installed, then the function provides a download link.

emb = documentEmbedding;

Create an array of input documents.

documents = [
    "the quick brown fox jumped over the lazy dog"
    "the fast brown fox jumped over the lazy dog"
    "the lazy dog sat there and did nothing"];

Map the input documents to vectors using the embed function.

embeddedDocuments = embed(emb,documents);

To estimate how similar the documents are, compute the pairwise cosine similarities using cosineSimilarity.

similarities = cosineSimilarity(embeddedDocuments)

similarities = 3×3

    1.0000    0.9840    0.5505
    0.9840    1.0000    0.5524
    0.5505    0.5524    1.0000

References

[1] Reimers, Nils, and Iryna Gurevych. "Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks" Preprint, submitted August 27, 2019. https://doi.org/10.48550/arXiv.1908.10084.

Version History

Introduced in R2024a

documentEmbedding

Description

Creation

Syntax

Description

Input Arguments

`modelName` — Document embedding model
`"all-MiniLM-L6-v2"` (default) | `"all-MiniLM-L12-v2"`

Object Functions

Examples

Map Documents to Vectors

References

Version History

See Also

Topics

documentEmbedding

Description

Creation

Syntax

Description

Input Arguments

modelName — Document embedding model "all-MiniLM-L6-v2" (default) | "all-MiniLM-L12-v2"

Object Functions

Examples

Map Documents to Vectors

References

Version History

See Also

Topics

`modelName` — Document embedding model
`"all-MiniLM-L6-v2"` (default) | `"all-MiniLM-L12-v2"`