MATLAB Answers

Reading sequence of binary packets consisting of multiple datatypes from binary file

35 views (last 30 days)
vik
vik on 17 Dec 2018
Commented: vik on 19 Dec 2018
I need to access data stored in binary files. In this case, it is an event log file from an old scada system. Each file contains a sqeuence of packets, and a packet contains, for example, 28 bytes of data. This data can be read on low-level like this:
  • 18 chars containing the signal name as ISO 8859-1 String
  • A 16 bit signed integer LE for signal value
  • A 32 bit signed integer LE which describes unix datetime
  • A 16 bit signed integer describing additional miliseconds to unix datetime
  • 8 so called "status bits", each bit is an attribute or "signal flag"
  • A 8 bit signed integer describing signal type (just another signal property)
There are also other files using the same method of storing data using these sequences of packets.
I'm looking for the correct approach for reading such files and reading those "packets". I have imported similar files which were already encoded as strings with textscan using formatSpec, is there anything similar on binary file read?
Your help is apprechiated.

  0 Comments

Sign in to comment.

Accepted Answer

Guillaume
Guillaume on 17 Dec 2018
Use fread for binary data. Following your description, probably:
fid = fopen(filepath, 'r', 'l', 'ISO-8859-1'); %l for Little-endian which I assume is the LE in the description
%reading one record:
signal.name = fread(fid, [1 18], '*char'); %read 18 characters
signal.value = fread(fid, 1, 'int16'); %read 1 signed 16 bit integer. stored as double
signal.date = fread(fid, 1, 'int32');
signal.milliseconds = fread(fid, 1, 'int16');
signal.status = fread(fid, 1, '*uint8'); %read as unsigned 8 bit, keep as uint8
signal.type = fread(fid, 1, 'uint8');
For reading multiple records you can wrap the above in a loop. It won't be very fast though. Another option is to use the skip argument of fread to all the names in one go, rewind the file, read all the values, rewind the file, etc..:
fid = fopen(filepath, 'r', 'l', 'ISO-8859-1'); %l for Little-endian which I assume is the LE in the description
%reading all records:
signal.names = fread(fid, [18 Inf], '*char', 28)'; %skip 28 bytes between each name
fseek(fid, 18, 'bof'); %rewind back to the first value (18 bytes after the start)
signal.values = fread(fid, [1 Inf], 'int16', 28)';
fseek(fid, 20, 'bof'); %rewind back to the first date (20 bytes after the start)
signal.date = fread(fid, [1 Inf], 'int32', 28);
%etc.
Possibly, the fastest option may be to read the whole file in one go and perform the conversion afterward:
recordfields = {'name', 'value', 'date', 'milliseconds', 'status', 'type'};
recordtypes = {'char', 'int16', 'int32', 'int16', 'uint8', 'uint8'};
recordsizes = {18, 2, 4, 2, 1, 1};; %size of each type in bytes
fid = fopen(filepath, 'r', 'l', 'ISO-8859-1'); %l for Little-endian which I assume is the LE in the description
data = fread(fid, [sum(recordsizes), Inf], '*uint8')'; %read the whole lot as uint8, stored as uint8. transpose so that rows are records
data = mat2cell(data, size(data, 1), recordsizes); %split columns into each field
data = cellfun(@(col, type) typecast(col, type), data, recordtypes, 'UniformOutput', false);
record = cell2struct(data, recordfields, 2);

  3 Comments

vik
vik on 19 Dec 2018
Thank you for your detailed answer. The first example works fine when putting it in a loop, but it takes about 1,4 Seconds to read the whole file.
I tried the third example, but had to modify it at some parts:
recordfields = {'name', 'value', 'date', 'milliseconds', 'status', 'type'};
recordtypes = {'char', 'int16', 'int32', 'int16', 'uint8', 'uint8'};
recordsizes = {18, 2, 4, 2, 1, 1}; %size of each type in bytes
data = fread(fid, [sum([recordsizes{:}]), Inf], '*uint8')'; % Transform cell to vector
data2 = mat2cell(data, size(data, 1), [recordsizes{:}]); % Same here
data3 = cellfun(@(col, type) typecast(col(:), type), data2, recordtypes, 'UniformOutput', false);
record = cell2struct(data3, recordfields, 2);
Reading the file with fread takes only a few miliseconds now, that's a huge difference.
Struct "data" is of size 50000x28 and class uint8
The struct "data2" is size 1x6, containing 50000x18 uint8, 50000x2 uint8, 50000x4 uint8 and so on.
However, typecast returns error:
Error using typecast
The first input argument must be a vector.
So I found out, that typecasting two uint8 to one uint16 works like this:
x = [255 67; 215 88; 128 45]; % Size: 3x2
y = typecast(uint8(x(:)),'uint16') % Size: 3x1
This returns the expected result. When using x instead of x(:), it returns error "The first input argument must be a vector.".
After modifying the typecast in your proposed solution, I still get error:
Error using typecast
Unsupported data type for conversion.
This can be reproduced by trying this example:
x = [255 67; 215 88; 128 45]; % Size: 3x2
y = typecast(uint8(x(:)),'char') % Size: 3x1
which throws exactly that error.
Is there an elegant way to solve this without if-statements?
Guillaume
Guillaume on 19 Dec 2018
Sorry for the bugs. It was obviously untested code. recordsizes was meant to be a matrix not a cell array, so you don't have to bother with [recordsizes{:}]:
recordsizes = [18, 2, 4, 2, 1, 1];
Indeed typecast only work with vectors so you need the (:) that I forgot. With regards to char not being supported unfortunately you'll need an if statement. cellfun is just a substitute for a for loop, so I'd rewrite the conversion as:
data3 = cell(size(data2)); %preallocation
for idx = 1:numel(data2)
if strcmp(recordtypes{idx}, 'char')
data3{idx} = char(data2{idx});
else
data3{idx} = typecast(data2{idx}(:), recordtypes{idx});
end
end
vik
vik on 19 Dec 2018
Unfortunately I cant publish the original file for testing, but I fixed the last bug now: before using typecast, the cell array needs to be transposed, otherwise the (:)-operator puts the vector together the wrong way.
This is the full working code and it works on all the files I need to import:
fid = fopen(filename, 'r', 'l', 'ISO-8859-1'); % Little Endian Order
recordfields = {'name', 'value', 'date', 'milliseconds', 'status', 'type'};
recordtypes = {'char', 'int16', 'int32', 'int16', 'uint8', 'uint8'};
recordsizes = [18, 2, 4, 2, 1, 1]; %size of each type in bytes
data = fread(fid, [sum(recordsizes), Inf], '*uint8')'; % Transform cell to vector
data2 = mat2cell(data, size(data, 1), recordsizes); % Same here
data2t = cellfun(@transpose,data2,'UniformOutput',false); %transpose for typecast
data3 = cell(size(data2t)); %preallocation
for idx = 1:numel(data2t)
if strcmp(recordtypes{idx}, 'char')
data3{idx} = char(data2{idx});
else
data3{idx} = typecast(data2t{idx}(:), recordtypes{idx});
end
end
record = cell2struct(data3, recordfields, 2);
Performance
The first Version with simple fread(fid, 1, 'int32') getting called thousands of times takes 1,41 Seconds to run.
The optimized Version with a onetime-call of fread takes only 0,013 Seconds.
Problem solved, thanks a lot.

Sign in to comment.

More Answers (0)

Sign in to answer this question.

Products


Release

R2018b

Translated by