Reading a string to get required data

2 visualizaciones (últimos 30 días)
Tom  Pearce
Tom Pearce el 27 de Mzo. de 2011
Im trying to write a program where I can read HTML code for the purposes of extracting the data, for some analyse im conducting. Ive managed to remove the HTML jargon and am now left with a string which contains the data i require. However im trying to convert the data into a readable cell array;
Mar25,2011>4.88>4.88>4.83>4.88>51,000>Mar24,2011>4.72>4.72>4.72>4.72>13,300>Mar22,2011>4.88>4.88>4.88>4.88>0>Mar18,2011>5.00>5.00>5.00>5.00>0>Mar17,2011>4.81>4.89>4.81>4.89>1,001>
I know this may seem rather simple to most of you but im new to Matlab. Basically im trying to convert this string into a column array, firstly with the date followed by the sucessive five numbers for the whole data set. Any help on this would be greatly appreciated.

Respuestas (2)

Walter Roberson
Walter Roberson el 27 de Mzo. de 2011
textscan('%s%f%f%f%f%f', 'Delimiter', '>', 'CollectOutput', 1)
You might need to change the shapes around afterwards. I am not clear on what you are envisioning for a "column array".
  2 comentarios
Tom  Pearce
Tom Pearce el 29 de Mzo. de 2011
Basically I just want the data in a list (6 Columns wide) from which i can write to file and produce a graph. Ive tried textscan but keeps returning {0x1} [0x1] [0x1] [0x1] [0x1] [0x1]. Now i realise im along the right lines i will persivere. Thanks Very Much for your help.
Walter Roberson
Walter Roberson el 29 de Mzo. de 2011
Ah, you have commas in your fifth numeric field; that throws off parsing them as a number. Also I forgot to show the string field.
Let T be the string you have the line stored in. Then,
Q = textscan(T,'%s%f%f%f%f%s', 'Delimiter', '>');
Q{6} = str2double(regexprep(Q{6},',',''));

Iniciar sesión para comentar.


Clemens
Clemens el 29 de Mzo. de 2011
Personally I don't remove the "html jargon" in such cases. I use regexps like:
table_lines = regexp(table,'<tr [^>]*>(.*?)</tr>','tokens');
table_line_entry = regexp(table_line,'<td [^>]*>(.*?)</td>','tokens');
This has the advantage that it keeps the structure information of original table.
Also you might run into problems if in a table cell is html code, or just a ">" sign.

Categorías

Más información sobre Characters and Strings en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by