Most frequent word in text

6 visualizaciones (últimos 30 días)
Roger Nadal
Roger Nadal el 27 de Nov. de 2019
Comentada: Image Analyst el 27 de Nov. de 2019
How to print all word in text that are together and how many time they appear one word per line order from most to least?
  4 comentarios
Roger Nadal
Roger Nadal el 27 de Nov. de 2019
No
Walter Roberson
Walter Roberson el 27 de Nov. de 2019
Editada: Walter Roberson el 27 de Nov. de 2019
"No" ? So "can't" is not a "word", and "John's" is not a word, and "self-expression" is not a word? If the file happened to contain
John's self-expression can't runh 7 tlick.
then what would the desired output be?

Iniciar sesión para comentar.

Respuestas (1)

Image Analyst
Image Analyst el 27 de Nov. de 2019
Try this:
str = '123 zxy abc def abc def abc last word';
% str = fileread(fileName); % Read in text from disk file.
words = strsplit(str);
uniqueWords = unique(words)
numUniqueWords = length(uniqueWords)
wordCounts = zeros(numUniqueWords, 1);
for k = 1 : numUniqueWords
thisWord = uniqueWords(k);
indexes = ismember(words, thisWord);
wordCounts(k) = sum(indexes);
end
% Show results in command window
wordCounts
Do you have the Text Analytics Toolbox? There are probably functions in that toolbox to get a histogram of words easier than this.
  3 comentarios
Walter Roberson
Walter Roberson el 27 de Nov. de 2019
If you have a cell array of character vectors that is the words, then you can use
randperm(number_of_words, number_to_choose_randomly)
to get out a cell array of character vectors that are that many randomly choosen words. After that your task is reduced to one of displaying them, such as
fprintf('%s\n', TheCellArray{:});
Image Analyst
Image Analyst el 27 de Nov. de 2019
Roger, you might find Talk To Transformer fun. It will generate sentences using a neural network. So even though it generates gibberish, it's not just random words. The grammar is right with nouns, adjectives, etc. right, and sentence structure right. For example when I type in "I like to use MATLAB Answers." below is how it completed the paragraph.
"I like to use MATLAB Answers. There's one new way to run a simulation if I have time, and that's to run the Model of a Power Grapher experiment with a mesh that's made of a grid that covers the corresponding coordinates. The reason for this is that the Lattice Proximal layer doesn't cover each coordinate perfectly, meaning that each layer overlaps some areas, which introduces a kind of noise to the output image. My current theory is that the noise causes the software not to converge as well. Unfortunately, I don't have the equipment."
I've seen one professor feed the whole works of Shakespeare into a network and after the first epoch it was just random letters, than after a few hundred more, it was breaking them into words, then sentences. And after even more it was getting grammar right. After more and more epochs the text got more and more reasonable and less gibberish sounding. He thinks if he trained it for weeks, it might produce something that sounded very reasonable.
cat typing.gif

Iniciar sesión para comentar.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by