I figured it out. The read statement is what moves the file pointer, and if it gives an error then the pointer stays put. I solved the problem by making the read function fileparts, obtaining the filename from that, and then using try,catch with extractFileText on that file.
How do I skip a file that gives an error when using fileDatastore to loop through a folder of pdfs?
3 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Allen
el 11 de En. de 2019
Comentada: Eniola Oluwakoya
el 29 de Jul. de 2020
I am mining text from several thousand pdfs in a folder using the Text Analytics Toolbox. I am using fileDatastore to loop through them. Some of the pdfs are encrypted, which gives an error with extractFileText. I have added a try,catch segment to skip those files, but when it catches the error it goes back to try and reads the same file again. The loop never ends. How do I increment the counter so that it will move on past the bad file? Here is part of the code:
fds = fileDatastore('File*.pdf','ReadFcn',@extractFileText);
while hasdata(fds)
% extract and prepare text
try % be prepared for error such as locked pdf
text=read(fds); % this is where error occurs
catch
disp('encrypted pdf');
continue
end
text=erasePunctuation(text);
% etc. (other text-parsing)
...
end
0 comentarios
Respuesta aceptada
Allen
el 12 de En. de 2019
1 comentario
Eniola Oluwakoya
el 28 de Jul. de 2020
Hi, could you share more light on how you made the read function fileparts?
Más respuestas (1)
Ver también
Categorías
Más información sobre Startup and Shutdown en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!