Reading sequence of binary packets consisting of multiple datatypes from binary file

33 visualizaciones (últimos 30 días)
I need to access data stored in binary files. In this case, it is an event log file from an old scada system. Each file contains a sqeuence of packets, and a packet contains, for example, 28 bytes of data. This data can be read on low-level like this:
  • 18 chars containing the signal name as ISO 8859-1 String
  • A 16 bit signed integer LE for signal value
  • A 32 bit signed integer LE which describes unix datetime
  • A 16 bit signed integer describing additional miliseconds to unix datetime
  • 8 so called "status bits", each bit is an attribute or "signal flag"
  • A 8 bit signed integer describing signal type (just another signal property)
There are also other files using the same method of storing data using these sequences of packets.
I'm looking for the correct approach for reading such files and reading those "packets". I have imported similar files which were already encoded as strings with textscan using formatSpec, is there anything similar on binary file read?
Your help is apprechiated.

Respuesta aceptada

Guillaume
Guillaume el 17 de Dic. de 2018
Use fread for binary data. Following your description, probably:
fid = fopen(filepath, 'r', 'l', 'ISO-8859-1'); %l for Little-endian which I assume is the LE in the description
%reading one record:
signal.name = fread(fid, [1 18], '*char'); %read 18 characters
signal.value = fread(fid, 1, 'int16'); %read 1 signed 16 bit integer. stored as double
signal.date = fread(fid, 1, 'int32');
signal.milliseconds = fread(fid, 1, 'int16');
signal.status = fread(fid, 1, '*uint8'); %read as unsigned 8 bit, keep as uint8
signal.type = fread(fid, 1, 'uint8');
For reading multiple records you can wrap the above in a loop. It won't be very fast though. Another option is to use the skip argument of fread to all the names in one go, rewind the file, read all the values, rewind the file, etc..:
fid = fopen(filepath, 'r', 'l', 'ISO-8859-1'); %l for Little-endian which I assume is the LE in the description
%reading all records:
signal.names = fread(fid, [18 Inf], '*char', 28)'; %skip 28 bytes between each name
fseek(fid, 18, 'bof'); %rewind back to the first value (18 bytes after the start)
signal.values = fread(fid, [1 Inf], 'int16', 28)';
fseek(fid, 20, 'bof'); %rewind back to the first date (20 bytes after the start)
signal.date = fread(fid, [1 Inf], 'int32', 28);
%etc.
Possibly, the fastest option may be to read the whole file in one go and perform the conversion afterward:
recordfields = {'name', 'value', 'date', 'milliseconds', 'status', 'type'};
recordtypes = {'char', 'int16', 'int32', 'int16', 'uint8', 'uint8'};
recordsizes = {18, 2, 4, 2, 1, 1};; %size of each type in bytes
fid = fopen(filepath, 'r', 'l', 'ISO-8859-1'); %l for Little-endian which I assume is the LE in the description
data = fread(fid, [sum(recordsizes), Inf], '*uint8')'; %read the whole lot as uint8, stored as uint8. transpose so that rows are records
data = mat2cell(data, size(data, 1), recordsizes); %split columns into each field
data = cellfun(@(col, type) typecast(col, type), data, recordtypes, 'UniformOutput', false);
record = cell2struct(data, recordfields, 2);
  3 comentarios
Guillaume
Guillaume el 19 de Dic. de 2018
Sorry for the bugs. It was obviously untested code. recordsizes was meant to be a matrix not a cell array, so you don't have to bother with [recordsizes{:}]:
recordsizes = [18, 2, 4, 2, 1, 1];
Indeed typecast only work with vectors so you need the (:) that I forgot. With regards to char not being supported unfortunately you'll need an if statement. cellfun is just a substitute for a for loop, so I'd rewrite the conversion as:
data3 = cell(size(data2)); %preallocation
for idx = 1:numel(data2)
if strcmp(recordtypes{idx}, 'char')
data3{idx} = char(data2{idx});
else
data3{idx} = typecast(data2{idx}(:), recordtypes{idx});
end
end
vik
vik el 19 de Dic. de 2018
Unfortunately I cant publish the original file for testing, but I fixed the last bug now: before using typecast, the cell array needs to be transposed, otherwise the (:)-operator puts the vector together the wrong way.
This is the full working code and it works on all the files I need to import:
fid = fopen(filename, 'r', 'l', 'ISO-8859-1'); % Little Endian Order
recordfields = {'name', 'value', 'date', 'milliseconds', 'status', 'type'};
recordtypes = {'char', 'int16', 'int32', 'int16', 'uint8', 'uint8'};
recordsizes = [18, 2, 4, 2, 1, 1]; %size of each type in bytes
data = fread(fid, [sum(recordsizes), Inf], '*uint8')'; % Transform cell to vector
data2 = mat2cell(data, size(data, 1), recordsizes); % Same here
data2t = cellfun(@transpose,data2,'UniformOutput',false); %transpose for typecast
data3 = cell(size(data2t)); %preallocation
for idx = 1:numel(data2t)
if strcmp(recordtypes{idx}, 'char')
data3{idx} = char(data2{idx});
else
data3{idx} = typecast(data2t{idx}(:), recordtypes{idx});
end
end
record = cell2struct(data3, recordfields, 2);
Performance
The first Version with simple fread(fid, 1, 'int32') getting called thousands of times takes 1,41 Seconds to run.
The optimized Version with a onetime-call of fread takes only 0,013 Seconds.
Problem solved, thanks a lot.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Large Files and Big Data en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by