Increasing vocabulary of pre-trained word embeddings
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
MathWorks Support Team
el 3 de Mayo de 2019
Editada: MathWorks Support Team
el 27 de Sept. de 2021
Can we extend the pre-trained word embeddings and increase the vocabulary?
Respuesta aceptada
MathWorks Support Team
el 2 de Sept. de 2021
Editada: MathWorks Support Team
el 27 de Sept. de 2021
Yes. In order to add more words to the existing vocabulary given by 'fastTextWordEmbedding', you can try the following:
1. Obtain the wordEmbedding object for 'fastTextWordEmbedding'-
>> emb = fastTextWordEmbedding;
2. Obtain the vocabulary from the wordEmbedding object:
>> vocab = emb.Vocabulary;
3. Add more words to the string array, for example:
>> vocab(end+1) = 'Hi';
>> vocab(end+1) = 'Hello';
4. Write to a text file with UTF-8 encoding in either the word2vec or GloVe text embedding format, or a zip file containing a text file of this format. You can use fopen, fprintf and fclose for this step:
5. Use 'readWordEmbedding' to read this text file with additional words, to get a new word embedding object. The doc page for 'readWordEmbedding' would explain more about why the file needs to be in the above format.
0 comentarios
Más respuestas (0)
Ver también
Categorías
Más información sobre Migrate GUIDE Apps en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!