Read text from a PDF document

Read the text from a simple PDF document into MATLAB as a string
3.6K descargas
Actualizado 3 Jul 2017

Ver licencia

Nota del editor: This file was selected as MATLAB Central Pick of the Week

% PDFREAD reads a PDF file using the iText java library.
%
% INPUT:
% PDF_LOCATION:
% String specifying the location of the PDF.
%
% OUTPUT:
% PDFTEXT:
% Cell array, each cell corresponds to each page of the parsed PDF
% file. Images are not extracted, only text.
%
% D. Wood, 7/3/2017
.
.
NOTES:
This software uses the open-source iText library.
The source .jar is included in the zip file, but more information can be found here:
https://github.com/ymasory/iText-4.2.0
.
Before the included pdfRead() function can be executed, simply run this command once:
javaaddpath('iText-4.2.0-com.itextpdf.jar')
The command can be run via console or script, but only needs to be done once.
.
This method is relatively robust, however it will not always return all the text in the document if the PDF has an unusual or complicated formatting (i.e. multiple non-fixed-width columns or excessive image captions).

Citar como

Derek Wood (2024). Read text from a PDF document (https://www.mathworks.com/matlabcentral/fileexchange/63615-read-text-from-a-pdf-document), MATLAB Central File Exchange. Recuperado .

Compatibilidad con la versión de MATLAB
Se creó con R2014b
Compatible con cualquier versión
Compatibilidad con las plataformas
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Versión Publicado Notas de la versión
1.0.0.0

(Updated description text slightly)
(Updated text again)
(Text again)
(I added a title image)