Count number of words in a PDF document.
6 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I want to count the number of words in a pdf. I have a pdf in Arabic and I want to know, for each word, how many times it occurs, like a histogram. For example WORK is in the pdf, so I want to know how many times did the work word occur in the pdf. I want this word to process as an image. So please help.
0 comentarios
Respuestas (1)
KSSV
el 15 de Feb. de 2022
You can read your pdf file using:
str = extractFileText("Test.pdf"); % give your pdf name
The above will read the conent of pdf into a string. And after you can use functions like strcmp, strcmpi, strfind to check whether the given word is present in the str. Then you can get the number.
s = strsplit(str) ; % split string to words of cell array
idx = strcmpi(s,word) ; % give your word
nnz(idx) % count how many times word is present
2 comentarios
Image Analyst
el 15 de Feb. de 2022
Editada: Image Analyst
el 15 de Feb. de 2022
@KSSV I didn't know about extractFileText(). Is it in the TextAnalytics Toolbox?
@sajid khan what do you mean by "I want this word to process as an image." If you can get the words directly from the data, why render the page as an image and then try to do OCR on it?
Ver también
Categorías
Más información sobre Characters and Strings en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!