How to output html text using fscanf or textscan

Greetings,
I'm trying to take a specified data from a html text using MATLAB.
Here is a sample html text(lines 12-18):
<FONT SIZE=+1 COLOR="#800000">TONIGHT</FONT>
EAST WINDS 10 TO 15 KNOTS. SEAS 3 TO 4 FEET. ISOLATED
SHOWERS.
<FONT SIZE=+1 COLOR="#800000">SATURDAY</FONT>
EAST WINDS 12 TO 16 KNOTS. SEAS 3 TO 4 FEET. ISOLATED
SHOWERS.
I want to take the headings 'Tonight' and 'Saturday' (Line 12 and 18) thus the data 'EAST WINDS 10 TO 15 KNOTS. SEAS 3 TO 4 FEET. ISOLATED SHOWERS.' and 'EAST WINDS 12 TO 16 KNOTS. SEAS 3 TO 4 FEET. ISOLATED SHOWERS. ' leaving me with an output of:
TONIGHT
EAST WINDS 10 TO 15 KNOTS. SEAS 3 TO 4 FEET. ISOLATED
SHOWERS.
SATURDAY
EAST WINDS 12 TO 16 KNOTS. SEAS 3 TO 4 FEET. ISOLATED
SHOWERS.
I want to use textscan or fscanf to make MATLAB scan the text file and leave me an output of just the plain text without de html tags.
Thank you for your time

 Respuesta aceptada

Walter Roberson
Walter Roberson el 30 de Jun. de 2012
Well if it is important to use textscan() or fscanf(), then:
DataCell = textscan( fid, '%s', 'Delimiter', ''); %read the entire file as strings, one per line.
Output = regexprep( DataCell, '<[^>]+>', '' ); %remove the HTML
This will be a cell array of strings.

3 comentarios

Juan Rosado
Juan Rosado el 30 de Jun. de 2012
Editada: Walter Roberson el 1 de Jul. de 2012
Thank's for the idea, but I keep geeting this error using regexprep, 'All cells for regexprep must be strings.'
The program I am developing goes like this:
clc,clear;
urlwrite('http://weather.noaa.gov/cgi-bin/fmtbltn.pl?file=forecasts/marine/coastal/am/amz722.txt','Anegada_Passage_Southward.txt'); % MATLAB goes to this internet site
fid=fopen('Anegada_Passage_Southward.txt');
for x=1:47
y=fgetl(fid);
s=(['x' num2str(x) '=' 'y;']);
eval(s); % Saves the internet site text as a string of html lines
disp(y); % Displays the full internet site on the command window
end
fclose(fid);
and if I type in the Command window x3 for example, MATLAB displays that line.
x3 =
< B ><FONT SIZE=+1 COLOR="#483D8B">COASTAL WATERS FORECAST
I want to creat a loop that saves the internet text, but ignores the html tags, using textscan or fscanf
Once again thank you for your time.
JJR
Change
regexprep( DataCell ...)
to
regexprep( DataCell{1} ... )

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Oceanography and Hydrology en Centro de ayuda y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by