How to read multiple grb2 files on a webpage ?

3 visualizaciones (últimos 30 días)
shukui liu
shukui liu el 14 de Jun. de 2021
Respondida: Chetan el 30 de Abr. de 2024
Dear all,
I like to download all the grb2 files from the following website:
The file link can be identified, for instance, the first one is:
Is there a way to automatically update the link to download these files?
Thanks for the help.
  2 comentarios
Rik
Rik el 14 de Jun. de 2021
What have you tried so far? You can probably either use a regex, or use strfind to find all pairs of http and .grb2.
shukui liu
shukui liu el 15 de Jun. de 2021
I have done it before when the data was on ftp server. But now it was shifted to web server, some of the ftp functions are not applicable any more :
% Nested function to download all relevant files from FTP
function files = downloadFiles()
% Initialize some variables
fieldsRegex = strjoin(fields, '|'); % {'hs','tp'} => 'hs|tp'
ftpSite = 'polar.ncep.noaa.gov';
ftpPath = ['/pub/history/waves/multi_1/' monthStr '/gribs'];
% Connect to the FTP site
f = ftp(ftpSite);
f.cd(ftpPath);
% Get the list of relevant *.grb2 files
filesMask = ['multi_1.' type '.*.grb2'];
files = f.dir(filesMask);
% If no matching files found, issue an error
if isempty(files)
f.close(); % Close the FTP connection
url = ['ftp://' ftpSite, ftpPath];
error('No %s files were found in <a href="%s">%s/</a>', filesMask, url, url)
end
% If fields were specified, then exclude the irrelevant files
if ~isempty(fields)
regex = ['\.(' fieldsRegex ')\.'];
invalidIdx = cellfun('isempty', regexp({files.name},regex));
files(invalidIdx) = [];
end
% Download all files into the Data folder
if ~exist(dataFolder,'dir')
fprintf('Downloading %d files into %s ...\n', length(files), dataFolder);
end
for idx2 = 1 : length(files)
fname = files(idx2).name;
if ~exist(fullfile(dataFolder,fname),'file')
fprintf(' Downloading %s (%.0f MB) ...\n', fname, files(idx2).bytes/2^20);
f.mget(fname, dataFolder);
end
end
%fprintf('Done - starting to process ...\n');
% Close the FTP connection
f.close();
end

Iniciar sesión para comentar.

Respuestas (1)

Chetan
Chetan el 30 de Abr. de 2024
It appears you are seeking an automated method to detect and download `.grb2` files from a specified URL, without the need to manually list the file names. You've previously attempted this with an FTP server and are now interested in utilizing web server functionalities.
To accomplish this, MATLAB's web functionalities can be leveraged to read webpage content, extract URLs for `.grb2` files using regular expressions as suggested by Rik, and then download these files. Here is how you can proceed:
  1. Read Webpage Content: Utilize `webread` to fetch the HTML content of the page listing the `.grb2` files.
  2. Extract URLs with Regular Expressions: Employ MATLAB's `regexp` function to identify all occurrences of `.grb2` file URLs or paths within the webpage content.
  3. Download Files: Iterate through the extracted URLs or file paths and use `websave` to download each file.
Below is an example script illustrating this process:
% URL of the page listing the .grb2 files
pageUrl = 'https://polar.ncep.noaa.gov/waves/hindcasts/multi_1/200502/gribs/';
% Directory to save the downloaded files
dataFolder = 'testing';
if ~exist(dataFolder, 'dir')
mkdir(dataFolder);
end
% Read the webpage content
pageContent = webread(pageUrl);
% Regular expression to match .grb2 file links
% Adjust the regex pattern if the webpage structure changes
pattern = 'href="([^"]+\\.grb2)"';
% Find all matches
fileLinks = regexp(pageContent, pattern, 'tokens');
% Flatten the cell array if necessary
fileLinks = [fileLinks{:}];
% Base URL for constructing the full file URL if needed
baseUrl = pageUrl;
% Download each file
for i = 1:length(fileLinks)
fileUrl = [baseUrl, fileLinks{i}];
[~, name, ext] = fileparts(fileUrl);
fileName = [name, ext];
filePath = fullfile(dataFolder, fileName);
% Check if the file already exists to avoid re-downloading
if ~exist(filePath, 'file')
fprintf('Downloading %s\n', fileName);
websave(filePath, fileUrl);
else
fprintf('File %s already exists. Skipping download.\n', fileName);
end
end
Refer to the following MathWorks documentation for detailed usage of the functions:
I hope this helps.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by