Encoding problem reading data using fread

14 visualizaciones (últimos 30 días)
Michael Liedlgruber
Michael Liedlgruber el 24 de Mayo de 2023
Comentada: Michael Liedlgruber el 6 de Sept. de 2023
Hi,
I'm using the following code to read in data from a file which contains text as well as binary data (European Data Format, to be more specific):
fid = fopen('test.edf', 'r', 'l');
fileType = fread(fid, 1, 'uint8');
id = char(fread(fid, [1 7], 'char'));
fclose(fid);
On my machine (Windows 10, MATLAB R2020a Update 6) this code runs fine and the values returned (i.e. fileType and id) are correct.
However, when this code is run on a different machine (one of our customers; also running Windows 10 but using MATLAB 2020a Update 1) using the same input file, the value of id seems to be read in incorrectly (the encoding used seems to be UTF-16BE. In fact, I get the same incorrect results on my machine if I specify UTF-16BE as the file encoding in the fopen call.
More interestingly, if I open the file on my machine without specifying an encoding and determine the used encoding using
[filename, permission, machineformat, encoding] = fopen(fid);
then the encoding UTF-16BE is returned.
And the default encoding in Windows is the same across the machines compared.
So, to me it seems like MATLAB on my machine detects an incorrect encoding because the file contains the BOM somewhere in the data but nevertheless returns the correct values. On the customers machine, however, it seems like the detected encoding is used, yielding different results.
My question is now: how is it possible that MATLAB obviously detects a wrong encoding but reads in the data correctly on my machine? And why do I get incorrect data if I explicitly specify the incorrect encoding (which is detected by MATLAB)? And why does the customer get different results although the same input file is used and although MATLAB detects the same (incorrect) encoding?
Is it possible that something has changed between Update 1 and Update 6 of MATLAB R2020a which causes MATLAB to behave differently? Unfortunately, I did not find any hint in the release notes of the updates with respect to the behavior of fopen.
Best,
Michael
  2 comentarios
Mathieu NOE
Mathieu NOE el 25 de Mayo de 2023
you may want to contact TMW support for that
Michael Liedlgruber
Michael Liedlgruber el 25 de Mayo de 2023
Thank you. Yes, if nobody in the community has an idea what may cause these inconsistencies, I will contact TMW support.
Fortunately, a fix is quite easy: by specifying UTF-8 encoding explicitly, everything works as expected on all machines.
But I'm still curious what's going on here.
Best,
Michael

Iniciar sesión para comentar.

Respuestas (1)

Ayush
Ayush el 4 de Sept. de 2023
  1 comentario
Michael Liedlgruber
Michael Liedlgruber el 6 de Sept. de 2023
Thank you. But this does not really answer my question. And, funnily, the page you linked to says "For more information, see ."
So, I already know that MATLAB defaults to UTF-8. But as you can see in my original post, the behavior is inconsistent between Update 1 and Update 6.And I have no explanation why on my machine the incorrect encoding is returned by fopen(fid), while the correct encoding is used when reading the data.

Iniciar sesión para comentar.

Categorías

Más información sobre Low-Level File I/O en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by