What Is Text Mining?

Text mining refers to searching for patterns in text data using data analytics techniques including importing, exploring, visualizing, and applying statistics and machine learning algorithms to text data.

Manually reading and sorting large sets of text would be unsurmountable to a human; MATLAB^® can automate the process effectively and efficiently, letting you interact with and visualize your data to identify patterns, trends, and complex relationships you could not find otherwise.

Text mining is used to derive quantitative statistics on large sets of unstructured text, themes in documents using topic modeling, qualitative inferences with sentiment analysis, and other valuable information. Text mining is used in finance, manufacturing, information technology, and many other industries. Applications include:

Counting the frequency of words or phrases in documents: (see bag-of-words, n-gram, tfidf)
Automating the classification of reviews based on sentiment, whether positive or negative
Developing predictive equipment maintenance schedules based on sensor and text log data

To learn more about deriving insight from text data using text mining, see Text Analytics Toolbox™ and Statistics and Machine Learning Toolbox™.

Examples and How To

Math with Words - Blog
Extract Text Data from Files - Example
Visualize Text Data Using Word Clouds - Example
Analyze Text Data Using Topic Models - Example
Analyze Text Data Using Multiword Phrases - Example
Machine Learning with Text: Get Started with Text Analytics in MATLAB - Resource Collection

Software Reference

Getting Started with Text Analytics Toolbox - Documentation
Text Data Preparation: Import text data into MATLAB and preprocess it for analysis - Documentation
Modeling and Prediction: Develop predictive models using topic models and word embeddings - Documentation
Display and Presentation: Visualize text data and models using word clouds and text scatter plots - Documentation

Getting Started with Text Analytics in MATLAB

Download white paper