downloading files from a website with conditions on names of files

1 visualización (últimos 30 días)
alpedhuez
alpedhuez el 9 de Feb. de 2022
Comentada: Walter Roberson el 12 de Mzo. de 2022
Question: I work on a website https://www.somecompany.com/xml/.
This directory has files whose filename starts with a letter "A" and "B".
The filenames in the directory are like:
A_20080403.xml
A_20080403_1.xml
A_20080403_2.xml
A_20080404_1.xml
B_20080403_1.xml
That is
  • Filenames are of the form "Capital letters"+"_"+"date"+"_"+"numbers".xml or "Capital letters"+"_"+"date".xml
  • There are dates that do not have corresponding files
I would like to download all the files whose filenames start with a letter "A".
What has been tried:
(a) I was able to save a single file using "websave" command.
for k = 20080401:20100101
filename = sprintf('A%d.xml', k);
url = ['https://www.somecompany.com/xml/' filename];
outfilename = websave(filename,url);
end
Problems with the above code: The above code does not work because
  • This code assumes the filename of the form "Capital letters"+"date".xml and not the filenames that explained above
  • This code returns the error for a date when there are no corresponding files and stops then
How shall one improve the above code?

Respuestas (1)

Walter Roberson
Walter Roberson el 9 de Feb. de 2022
It would be more robust / faster if the site provided a way to list the available files, instead of having to do trial and error.
baseurl = "https://www.somecompany.com/xml/";
datelimits = datetime({'20080401', '20100101'}, 'InputFormat', 'yyyyMMdd');
subfile_limit = 5; %no more than _5 -- adjust as appropriate
subfile_modifier = ["", "_" + (1:subfile_limit)] + ".xml";
for Day = datelimits(1):datelimits(2)
daystr = string(Day);
for Sub = subfile_modifier
filename = "A_" + daystr + Sub;
url = baseurl + filename;
try
outfilename = websave(filename,url);
fprintf('fetched %s\n', filename);
catch
break; %skip remaining subfiles for this date upon first failure
end
end
end
  2 comentarios
alpedhuez
alpedhuez el 12 de Mzo. de 2022
When I ran the code above, I got the value of
daystr "01-Jan-2010"
filename "A_01-Jan-2010.xml"
But the filename is, as noted above, of the format of
A_20100101.xml
Thus one needs to modify the process of assigning the variable daystr.
Walter Roberson
Walter Roberson el 12 de Mzo. de 2022
datelimits = datetime({'20080401', '20100101'}, 'InputFormat', 'yyyyMMdd', 'Format', 'yyyyMMdd');

Iniciar sesión para comentar.

Categorías

Más información sobre Downloads en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by