How to read particular dates from an HTML script?

1 visualización (últimos 30 días)
Abhinav
Abhinav el 28 de En. de 2018
Editada: Abhinav el 29 de En. de 2018
I have downloaded HTML script of of few webpages; one is uploaded for reference. My task is to extract some information from the this script. On line 851 latitude and longitude are given which I extracted using the following code:
filename=strcat(pwd,'/',num2str(site(i))); % file to be read, which is same as the file uploaded
fileID=fopen(filename); % fileID
open_file=textscan(fileID,'%s','%f'); % parsing the file
open_file=open_file{1,1};
lat_id=find(ismember(open_file,... % Finding the position of Latitude in text-file
'<dd>Latitude'));
long_id=find(ismember(open_file,... % Finding the position of Longitude in text-file
'Longitude'));
lat(i)=open_file(lat_id+1); % latitude
long(i)=open_file(long_id+1); % longitude
proj=open_file(long_id+3); % projection type, e.g., NAD27, NAD83
But I am not able to use similar code for reading the data in line 865, which contains the time-range of the some data. The problem is that the variable open_file do not seem to contain these values. Any suggestions will be helpful.

Respuesta aceptada

Walter Roberson
Walter Roberson el 29 de En. de 2018
filename=strcat(pwd,'/',num2str(site(i))); % file to be read, which is same as the file uploaded
S = fileread(filename);
place_info = regexp(S, 'Latitude\s+(?<lat>[^ ,]+),\s*\S+\s*Longitude\s+(?<long>\S+)\s*\S+\s*(?<proj>\w+)', 'names', 'once');
periods_info = regexp(S, '''begin_date''[^\d]*(?<begin_date>\d+-\d+(-\d+)?).*?end_date[^\d]*(?<end_date>\d+-\d+(-\d+)?).*?sites_selection_links\W*(?<stats_type>[^<]+)', 'names');
other_info = regexp(S, 'site_no=\d+">(?<stats_type>.*?)</a>.*?''begin_date''[^d]*?(?<begin_date>\d+(-\d+(-\d+)?)?).*?end_date[^\d]*?(?<end_date>\d+(-\d+(-\d+)?)?)', 'names');
combined_info = [periods_info, other_info];
Now:
place_info is a struct with fields 'lat', 'long', and 'proj' reflecting latitude, longitude, and projection. The lat and long are in the form they were stored in the file, so they may have a ° in them, corresponding to the ° symbol.
combined_info is a struct with fields begin_date, end_date, and stats_type . stats_type is the information about what the period is describing. In the sample data file those are
'Daily Statistics' 'Monthly Statistics' 'Annual Statistics' 'Current / Historical Observations' 'Peak streamflow' 'Field measurements' 'Field/Lab water-quality samples' and 'Water-Year Summary'
  1 comentario
Abhinav
Abhinav el 29 de En. de 2018
Editada: Abhinav el 29 de En. de 2018
Thanks a lot! I did not know that these routines exist in MATLAB.

Iniciar sesión para comentar.

Más respuestas (0)

Etiquetas

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by