Regular expressions help with HTML source code
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
I'm looking to parse through some HTML source code to pull information from the Wall Street Journal. I need to pull the price of the following commodities: the 4 domestic crude oil spot prices, copper, aluminum, cotton, and cocoa
I'm having some trouble with getting regexp to work the way I want it to.
what string expression would you use to pull out the middle (bold) price listed? If the value is n.a., it's okay if it just returns 'n.a.' or its equivalent.
I tried a variety of methods and I couldn't get it to work.
Could someone show an example of the string he or she would use for extracting the price?
Thanks!
0 comentarios
Respuesta aceptada
Cedric
el 12 de Mzo. de 2013
Editada: Cedric
el 12 de Mzo. de 2013
>> buffer = urlread('http://online.wsj.com/mdc/public/page/2_3023-cashprices.html');
>> item = 'West Texas Intermediate, Cushing' ;
>> pattern = [item, '.*?*(?<prefix>.*?)(?<price>[\d\.]*)*'] ;
>> tokens = regexp(buffer, pattern, 'names') ;
tokens =
prefix: ''
price: '92.06'
>> item = 'London fixing, spot price' ;
>> pattern = [item, '.*?*(?<prefix>.*?)(?<price>[\d\.]*)*'] ;
>> tokens = regexp(buffer, pattern, 'names') ;
tokens =
prefix: '£' % Code, but the forum renders it.
price: '19.4273'
Cheers,
Cedric
Note that a . is returned for n.a. entries.
EDIT 1: corrected pattern thank to Walter's comment about pound-signs.
EDIT 2: updated with named tokens so we get the prefix (e.g. pound-sign).
3 comentarios
Cedric
el 12 de Mzo. de 2013
Ah thank you Walter, I had not realized that there could be these signs!
Más respuestas (1)
Walter Roberson
el 11 de Mzo. de 2013
'^<b>.*?\d+(\.\d+)?<\\b>$'
This should allow for the currency symbol, and for the possibility that the decimal point and following digits are not there. The only real "trick" here is the use of .*? to indicate the minimum expansion of repeated . (i.e., match any one character) where .* by itself is "greedy" and would match as many characters as possible.
Ver también
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!