Text Mining

What Is Text Mining?

Text mining refers to searching for patterns in text data using data analytics techniques including importing, exploring, visualizing, and applying statistics and machine learning algorithms to text data.

Manually reading and sorting large sets of text would be unsurmountable to a human; MATLAB® can automate the process effectively and efficiently, letting you interact with and visualize your data to identify patterns, trends, and complex relationships you could not find otherwise.

Text mining is used to derive quantitative statistics on large sets of unstructured text, themes in documents using topic modeling, qualitative inferences with sentiment analysis, and other valuable information. Text mining is used in finance, manufacturing, information technology, and many other industries. Applications include:

  • Counting the frequency of words or phrases in documents: (see bag-of-words, n-gram, tfidf)
  • Automating the classification of reviews based on sentiment, whether positive or negative
  • Developing predictive equipment maintenance schedules based on sensor and text log data

To learn more about deriving insight from text data using text mining, see Text Analytics Toolbox™ and Statistics and Machine Learning Toolbox™.

See also: data science, machine learning, Statistics and Machine Learning Toolbox, natural language processing, sentiment analysis, long short-term memory (LSTM) networks, N-gram