Count number of words in a PDF document.

6 visualizaciones (últimos 30 días)
sajid khan
sajid khan el 14 de Feb. de 2022
Editada: Image Analyst el 15 de Feb. de 2022
I want to count the number of words in a pdf. I have a pdf in Arabic and I want to know, for each word, how many times it occurs, like a histogram. For example WORK is in the pdf, so I want to know how many times did the work word occur in the pdf. I want this word to process as an image. So please help.

Respuestas (1)

KSSV
KSSV el 15 de Feb. de 2022
You can read your pdf file using:
str = extractFileText("Test.pdf"); % give your pdf name
The above will read the conent of pdf into a string. And after you can use functions like strcmp, strcmpi, strfind to check whether the given word is present in the str. Then you can get the number.
s = strsplit(str) ; % split string to words of cell array
idx = strcmpi(s,word) ; % give your word
nnz(idx) % count how many times word is present
  2 comentarios
sajid khan
sajid khan el 15 de Feb. de 2022
we can do with arabic i had the pdf of in arabic so how will it work i want that and how i can define my pdf path .. as i am using online compiler so i can define the path of pdf to read the compiler and give the input automatically
Image Analyst
Image Analyst el 15 de Feb. de 2022
Editada: Image Analyst el 15 de Feb. de 2022
@KSSV I didn't know about extractFileText(). Is it in the TextAnalytics Toolbox?
@sajid khan what do you mean by "I want this word to process as an image." If you can get the words directly from the data, why render the page as an image and then try to do OCR on it?

Iniciar sesión para comentar.

Categorías

Más información sobre Characters and Strings en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by