- Use context around the OOV word. You can use the word embedding of the previous and next word to your current OOV word.
- Use synonyms or similar word to get the word embedding for your OOV word.
Handling out-of-vocabulary word in word embedding
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
Ismat Mohd Sulaiman
el 5 de Jul. de 2021
Comentada: Ismat Mohd Sulaiman
el 16 de Ag. de 2021
I'm using FastText and my own word embedding on a set of documents. It is being used to detect abbreviations (Y/N) for each word token.
When testing, words that does not have vectors (out-of-vocabulary - OOV words), and discarded and not included in the performance measures (precision, recall, etc.) giving a false result. How do you handle this?
Would you replace all words with NaN values be included in the performance measure? Can the NaN values be replaced with a vector? How would you decide which vector?
0 comentarios
Respuestas (1)
Prince Kumar
el 16 de Ag. de 2021
From my understanding your want to handle OOV(out-of-vocabulary) words for your abbreviations detection task. For now MATLAB fastTextWordEmbedding does not handle OOV words.
There are many ways to do it, following are the two popular ones:
Ver también
Categorías
Más información sobre Get Started with MATLAB en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!