Main Content

Language Support

Information on language support in Text Analytics Toolbox™

Text Analytics Toolbox supports the languages English, Japanese, German, and Korean. Most Text Analytics Toolbox functions also work with text in other languages. For more information, see Language Considerations.

Functions

expand all

tokenizedDocumentArray of tokenized documents for text analysis
removeStopWordsRemove stop words from documents
normalizeWordsStem or lemmatize words
stopWordsList of stop words
mecabOptionsOptions for MeCab tokenization (Since R2019b)
tokenDetailsDetails of tokens in tokenized document array
addSentenceDetailsAdd sentence numbers to documents
addPartOfSpeechDetailsAdd part-of-speech tags to documents
addEntityDetailsAdd entity tags to documents (Since R2019a)
addLemmaDetailsAdd lemma forms of tokens to documents
addLanguageDetailsAdd language identifiers to documents
corpusLanguageDetect language of text

Topics

English Language

Japanese Language

German Language

Korean Language

Other Languages