Dear community,
I have a pdf with news headlines, and I need to count the number of words each title has and the number of times the words starting with "co" and the word "price" appear in each title. I have not much experience using the Text Analytics Toolbox in Matlab. As far as I can see, "tokenizedDocument" already gives you the total number of words (or tokens) per headline, and "context" counts a specific word. However, I do not know how to ask Matlab to look for words starting with "co". Also, how do I get this information displayed in a table?
I leave my pdf and my code.
I really appreciate any help you can provide!
filename = "Factiva_sample_headlines_1.pdf";
str = extractFileText(filename);
textData = split(str,[newline newline]);
textData = textData(cellfun(@(s)isempty(regexp(s,'Page')),textData));
cleanedDocuments = tokenizedDocument(textData);