Weird spaces undetected by strfind

Question

Ryan Egan el 1 de Nov. de 2012

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/52508-weird-spaces-undetected-by-strfind

I am trying to locate a phrase called "Faceresponse.acc" in a large array. I used this code to get all of the text out of my .txt file:

fid = fopen('HWFC_Car_P New-1-1.txt', 'r');

for i = 1:6600

mystuff{i} = fgetl(fid);

end

fclose('all');

%The text in the text file looks like "Faceresponse.acc" but for some reason when I look at some of the strings in 'mystuff', it's spaced out: 'F a c e r e s p o n s e'. So I found this phrase in mystuff{977}, made sure it was class "char", and did this:

b = mystuff{977} strfind(b, 'F a c e')

It returned nothing. I tried just typing out 'F a c e r e s p o n s e' and assigning it to 'b', instead of using mystuff{977}, and strfind had no trouble locating the spaced out 'F a c e'. I also tried strfind for other things in the string, and it was fine. But it would NOT index the spaces between the letters.

So my question: what's going on here? What are the spaces between the letters if strfind is not identifying them as spaces?

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Ryan Egan el 1 de Nov. de 2012

Abrir en MATLAB Online

double(mystuff{977})

Columns 1 through 15
     0     9     0     9     0    70     0    97     0    99     0   101     0    82     0
Columns 16 through 30
   101     0   115     0   112     0   111     0   110     0   115     0   101     0    46
Columns 31 through 43
     0    65     0    67     0    67     0    58     0    32     0    49     0

%It seems that the spaces are zeros.

Chris el 1 de Nov. de 2012

Could it be a font issue. I had a similiar thing whe using fgetl to read a .doc. At a certain point there was a 'delta' symbol using wingdings, after that line each character had a space after it. I got rid of the odd font and that fixed it.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Image Analyst el 1 de Nov. de 2012

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/52508-weird-spaces-undetected-by-strfind#answer_64010

Editada: Image Analyst el 1 de Nov. de 2012

Abrir en MATLAB Online

It's probably written in some double byte unicode font, which are designed to handle every language in the whole world, so it needs two bytes instead of just one like Western ASCII style. I'm not really sure how to deal with unicodes since I don't encounter them. Maybe you can just do something like this:

for i = 1:6600
    thisLine = fgetl(fid);
    mystuff{i} = thisLine(2:2:end); % Take just every other byte.
end

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ryan Egan el 1 de Nov. de 2012

This works nicely and avoids the whole problem. Thanks!

Iniciar sesión para comentar.

Answer 2

José-Luis el 1 de Nov. de 2012

2
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/52508-weird-spaces-undetected-by-strfind#answer_64015

Editada: José-Luis el 1 de Nov. de 2012

Abrir en MATLAB Online

This sounds like an encoding problem. To find out what encoding your installation of Matlab uses:

feature('DefaultCharacterSet')

To see the encoding of your file, you could do like this, but the best bet would probably be to ask the author of the program that generated the text file.

If the encodings are different, then you could try changing the encoding of Matlab to the one of the file, e.g.:

feature('DefaultCharacterSet', 'UTF8')

But as Matt says, it is probably easier to pass addtional arguments to fopen()

Finally, here's an interesting read regarding Matlab and encoding.

2 comentarios
Mostrar NingunoOcultar Ninguno

Walter Roberson el 1 de Nov. de 2012

The article is a fairly reasonable summary. It does make a little mistake right near the end where it says that char() takes unicode values; the mistake is that char() only takes unicode values up to 65535 and does not provide any mechanism for codepoints above that. There are two unicode related routines that can be used to deal with code points above that or to deal with "code pages" or the like.

José-Luis el 1 de Nov. de 2012

Good to know. Thank you.

Iniciar sesión para comentar.

Answer 3

Matt J el 1 de Nov. de 2012

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/52508-weird-spaces-undetected-by-strfind#answer_63998

FOPEN let's you specify different encoding schemes for reading from the file. I'm guessing that you might need a different one from what you're using. Was the file created on the same platform as you're now using to read it?

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Ryan Egan el 1 de Nov. de 2012

Well, I could potentially need to run data analyses on hundreds of files like this over the next little while, and my lab members may need it for years to come, so I'm trying to write a script that will last the longest and doesn't require any reformatting.

I already have a script that can run the analyses when I export the file into a different format, but that can take a while depending on the sheer number of files I have to analyze.

Matt J el 1 de Nov. de 2012

Editada: Matt J el 1 de Nov. de 2012

Yes, but does resaving the file as I described work? Let's first identify at least one solution to the problem, and then worry about optimizing it later. Also, did you follow Jose-Luis' suggestion about finding the encoding that your file uses?

Iniciar sesión para comentar.

Weird spaces undetected by strfind

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (2)

2 comentarios
Mostrar NingunoOcultar Ninguno

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Weird spaces undetected by strfind

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (2)

2 comentarios Mostrar NingunoOcultar Ninguno

4 comentarios Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos