Strfind doesn't find string

Hi everyone
I'm web scrapping using strfind but I can't find one string with spaces. Assume that part of my text is the following:
tempHTML2=' Área <strongclass="search-results-property-list__feature-value"> 65.0'
And I want this:
str14='Área <strongclass="search-results-property-list__feature-value">';
However, strfind(tempHTML2,str14) returns me blank. If I remove 'Área ', it returns the correct location of the string. If I look for just ' Área', it also finds correctly.
One issue could be the blank spaces. However, the tempHTML2 is constructed as follows:
tempHTML2=tempHTML;
tempHTML2(tempHTML2==' ')=[];
One issue is that tempHTML2 has blank spaces even after deleting them. The sum(ismember(tempHTML2,' ')) returns zero.
Thanks in advance,

6 comentarios

Steven Lord
Steven Lord el 1 de Jun. de 2016
Can you show the full output of these two commands?
D1 = double(tempHTML2)
D2 = double(str14)
the cyclist
the cyclist el 1 de Jun. de 2016
For what it's worth, the code
tempHTML2=' Área <strongclass="search-results-property-list__feature-value"> 65.0';
str14='Área <strongclass="search-results-property-list__feature-value">';
strfind(tempHTML2,str14)
returns "2" for me.
Walter Roberson
Walter Roberson el 1 de Jun. de 2016
You remove blanks from your template but not from str14
hpramos4@gmail.com
hpramos4@gmail.com el 1 de Jun. de 2016
Editada: hpramos4@gmail.com el 1 de Jun. de 2016
Actually, tempHTML2 has more content than the string I've wrote. It's a 23000 character string (it's the HTML code from a webpage). The result from double function are a lot of numbers in both cases (I won't write it down here because it's too big).
If I just copy and paste just like the cyclist did it returns the correct location. However, if I do this within the whole HTML code, it gets me nothing. The strange thing is that I've copied this part of the string from the HTML page scrapped to Matlab using the webread function. So, if I try to strfind copying and pasting it works, if I try to do this in the original HTML code, it doesn't work at all.
I have already tried removing all the blank spaces unsuccessfully.Even though the blanks have been removed by tempHTML2(tempHTML2==' ')=[], there are still blank spaces in tempHTML2. Any thougths about this?
Thanks again,
Walter Roberson
Walter Roberson el 1 de Jun. de 2016
Please attach a copy of the tempHTML2 (before blank removal), or post the URL.
hpramos4@gmail.com
hpramos4@gmail.com el 1 de Jun. de 2016

This is the webpage: wp

These are the HTML codes (the part I need) with and without spaces. Both were scrapped with urlread.

There's something strange in this: when I open the tempHTML2 string and look manually for the string, there are spaces between "Área" (check the print attached). When I write it to a txt the spaces are gone.

Still, neither strfind or regexp works.

Iniciar sesión para comentar.

Respuestas (1)

hpramos4@gmail.com
hpramos4@gmail.com el 6 de Jun. de 2016

0 votos

I've solved the problem using isspace() in the tempHTML function. Thank you all.

Categorías

Más información sobre Characters and Strings en Centro de ayuda y File Exchange.

Etiquetas

Preguntada:

el 1 de Jun. de 2016

Respondida:

el 6 de Jun. de 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by