How to read multiple grb2 files on a webpage ?
3 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Dear all,
I like to download all the grb2 files from the following website:
The file link can be identified, for instance, the first one is:
Is there a way to automatically update the link to download these files?
Thanks for the help.
2 comentarios
Rik
el 14 de Jun. de 2021
What have you tried so far? You can probably either use a regex, or use strfind to find all pairs of http and .grb2.
Respuestas (1)
Chetan
el 30 de Abr. de 2024
Hi @shukui liu
It appears you are seeking an automated method to detect and download `.grb2` files from a specified URL, without the need to manually list the file names. You've previously attempted this with an FTP server and are now interested in utilizing web server functionalities.
To accomplish this, MATLAB's web functionalities can be leveraged to read webpage content, extract URLs for `.grb2` files using regular expressions as suggested by Rik, and then download these files. Here is how you can proceed:
- Read Webpage Content: Utilize `webread` to fetch the HTML content of the page listing the `.grb2` files.
- Extract URLs with Regular Expressions: Employ MATLAB's `regexp` function to identify all occurrences of `.grb2` file URLs or paths within the webpage content.
- Download Files: Iterate through the extracted URLs or file paths and use `websave` to download each file.
Below is an example script illustrating this process:
% URL of the page listing the .grb2 files
pageUrl = 'https://polar.ncep.noaa.gov/waves/hindcasts/multi_1/200502/gribs/';
% Directory to save the downloaded files
dataFolder = 'testing';
if ~exist(dataFolder, 'dir')
mkdir(dataFolder);
end
% Read the webpage content
pageContent = webread(pageUrl);
% Regular expression to match .grb2 file links
% Adjust the regex pattern if the webpage structure changes
pattern = 'href="([^"]+\\.grb2)"';
% Find all matches
fileLinks = regexp(pageContent, pattern, 'tokens');
% Flatten the cell array if necessary
fileLinks = [fileLinks{:}];
% Base URL for constructing the full file URL if needed
baseUrl = pageUrl;
% Download each file
for i = 1:length(fileLinks)
fileUrl = [baseUrl, fileLinks{i}];
[~, name, ext] = fileparts(fileUrl);
fileName = [name, ext];
filePath = fullfile(dataFolder, fileName);
% Check if the file already exists to avoid re-downloading
if ~exist(filePath, 'file')
fprintf('Downloading %s\n', fileName);
websave(filePath, fileUrl);
else
fprintf('File %s already exists. Skipping download.\n', fileName);
end
end
Refer to the following MathWorks documentation for detailed usage of the functions:
- https://www.mathworks.com/help/matlab/ref/webread.html
- https://www.mathworks.com/help/matlab/ref/websave.html
- https://www.mathworks.com/help/matlab/ref/regexp.html
I hope this helps.
0 comentarios
Ver también
Categorías
Más información sobre Web Services en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!