Error while using readPDFFormData to extract data from online pdf files
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
Ana Egatz-Gomez
el 21 de Dic. de 2023
Comentada: Ana Egatz-Gomez
el 6 de Mzo. de 2024
Hi, I have two related questions.
First, when I use the command readPDFFormData with an online pdf, I get an error message. How should I write the URL so it works?
Second, is it possible to extract data from a list of links to pdf files on a website, just not one by one? (This website https://www.ncdoi.gov/consumers/medicare-and-seniors-health-insurance-information-program-shiip/medicare-advantage-medicare-health-plans-part-c#MedicareAdvantageLandscapesbyCounty2024-2398)
Any help will be greatly appreciated.
filename = "https://www.ncdoi.com/SHIIPCurrentYear/Documents/MAL%20by%20County/2024%20MAPD%20Pender%20County.pdf";
data = readPDFFormData(filename);
0 comentarios
Respuesta aceptada
Anton Kogios
el 22 de Dic. de 2023
I was not able to get the URL to work directly either (not sure why since I'm pretty sure reading online images such as PNG/JPG works...), but here is a workaround (it just downloads the PDF first):
filenameOnline = "https://www.ncdoi.com/SHIIPCurrentYear/Documents/MAL%20by%20County/2024%20MAPD%20Pender%20County.pdf";
filenameLocal = "test.pdf"; % can set to custom directory
websave(filenameLocal,filenameOnline);
formData = readPDFFormData(filenameLocal)
fileText = extractFileText(filenameLocal); % since readPDFFormData returns an empty struct, this is just to make sure we can read the PDF
As for your second question, I can't seem to access the PDFs on the website you mentioned (I think it is because I'm in a different country), but you should be able to just use a for loop. You can also look into using sprintf if the URLs have a repetitive naming system, which I've also demonstrated. Something like:
filenamesOnline = ["url1.pdf";
"url2.pdf";
"url3.pdf"];
for i = 1:length(filenamesOnline)
filenameLocal = sprintf('test%i.pdf',i);
websave(filenameLocal,filenameOnline(i));
formData = readPDFFormData(filenameLocal)
end
I hope this helps and you are able to get it to work for you!
Más respuestas (0)
Ver también
Categorías
Más información sobre Startup and Shutdown en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!