Borrar filtros
Borrar filtros

textscan or import of unicode encoded textfile

5 visualizaciones (últimos 30 días)
Hyung-Sik Kim
Hyung-Sik Kim el 22 de Sept. de 2011
Question 1: Are textscan and importdata supposed to work with unicode encoded text file?
Question 2: After UTF-8 encoded file is opened with the correct encoding spec in the fopen argument, textscan output puts the following three characters  preceding the very first valid data I have in the file. Is this expected behavior undocumented?

Respuestas (2)

Anne
Anne el 5 de Dic. de 2011
I have the same problem with my old MATLAB 7.3.0. Textscan won't read correctly unicode files, but it can deal with unicode formatted strings.
Thus a simple (but slow) workaround is to read text first with scanf and run textscan on the text.
[f,msg]=fopen(nomfic,'r','n','UTF-8');
LIGNES=textscan(f,'%[^\n]','delimiter','\n');
won't work with unicode encoded characters but
[f,msg]=fopen(nomfic,'r','n','UTF-8');
txt=fscanf(f,'%c');
LIGNES=textscan(txt,'%[^\n]','delimiter','\n');
will.

Walter Roberson
Walter Roberson el 22 de Sept. de 2011
Answer 1: textscan() is; I do not know about importdata
Answer 2: When you explicitly specify one of the UTF-* as the encoding, the MATLAB code will not look for a Byte Order Mark, and will leave any Byte Order Mark in the file stream. If you do not explicitly specify the encoding, then the byte stream will be examined for a Byte Order Mark and if found the encoding will be determined by that.
It is not recommended that a Byte Order Mark be used with UTF-8, but some Windows editors insert it anyhow. The Byte Order Mark represented in UTF-8 is 0xEF,0xBB,0xBF which show up exactly as the characters you notice. See reference
I have not examined to see whether it makes a difference as to whether you opened the file with 'r' or 'rt' . I use 'rt' when referring to text files, as it can make a difference in some instances.

Categorías

Más información sobre Data Import and Export en Help Center y File Exchange.

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by