How to extract formatted hex data from a text file?

I have a text file with tens of thousands of lines formatted like so:
...
[11:52:30.739] : Value: [BC-7F-00-00-F7-F6-2A-D7-24-D9-81-EE-6A-DD-08-D4-BD-D5-09-E1-10-F5-22-DB-20-DE-55-EA-21-D9-22-D1-EA-D5-45-E5-95-F8-2A-D7-22-DA-12-EF-95-DD-2A-D3-85-D4-5A-E1-EB-F6-2A-DB-22-DE-54-EA-A2-D8-7D-CF-15-D4-0A-E5-FB-FA-88-D6-A5-D9-D1-EA-80-DC-95-D1-24-D2-2D-E1-D5-F9-AA-DA-DA-DD-AA-E6-BD-D7-56-CE-D5-D1-5A-E5-5B-FD-D5-D5-55-DA-A8-E9-DA-DB-EF-CF-11-D0-D5-E1-15-FC-EA-D9-21-DE-D5-E6-A0-D6-B5-CD-0A-D0-EE-E6-55-FF-20-D5-55-DB-DE-E9-D5-DA-96-CE-2A-CE-5A-E3-DD-FC-EA-D8-F6-DD-55-E7-D5-D5-8A-CE-16-D0-A8-E8-15-FE-D4-D5-6A-DC-D5-EA-AA-DA-2A-CF-22-CE-A5-E4-B5-FC-6A-D9-4A-DE-AA-E8-FA-D5-A2-CF-12-D1-AA-E9-A0-FD-4A-D6-55-DC-EA-EB-95-DA-F5-CF-2A-CF-94-E4-EA-FB-CA-D9-40-DE-D5-E8-5A-D6-44-D0-2A-D2-B5-E9-2A-FD-40-D6-AA-DC-DB-EB-AB-DA-E8-CF-55-CF-6A-E4]
[11:52:30.777] : Value: [CB-7F-00-00-FF-FB-2A-D5-2A-DA-2D-EC-95-DB-B6-D0-2A-D1-B5-E5-DA-FA-15-D9-2A-DF-B6-E8-7F-D7-02-CD-55-D1-45-E9-4A-FE-AA-D4-D5-DA-D5-EB-EA-DB-0A-CF-20-D0-AA-E5-AF-FC-69-D9-DA-DE-AA-E6-55-D7-2A-CC-55-D0-56-E9-95-FF-00-D5-81-DA-55-E8-95-DB-95-CE-22-CF-56-E6-F6-FD-D5-D8-A4-DE-55-E5-5B-D6-B5-CC-D2-CF-16-EB-2A-01-85-D4-29-DB-AA-EA-AA-DA-BF-CE-D5-CE-44-E7-2A-FC-90-D9-4A-DD-2A-E5-A8-D6-AA-CD-55-D1-AD-E9-AA-FE-35-D4-EA-D8-24-E8-CA-DA-6D-D0-6A-D0-B5-E5-DD-FC-B6-D8-6A-DD-56-E5-85-D6-D2-CD-A5-D0-AB-E9-11-00-82-D4-15-DA-55-EA-0A-DB-15-D0-55-CF-6A-E6-50-FA-EA-D9-40-DC-AA-E5-28-D7-AA-CD-6E-D1-A4-E8-55-FB-7A-D5-55-D7-2A-E8-B6-DA-95-D1-7A-D1-5D-E5-57-F8-7D-D9-20-DD-0A-E7-DE-D7-20-CE-1A-D2-D5-E7-94-FC-5A-D5-89-D8-4A-EA-94-DB-AB-D1-D6-D1-AA-E4]
...
I need to extract the hex data in the following way: The first 32 bits are an index, and the following words are actual data.
Currently I'm parsing the file line-by-line and extracting the hex data using regexp. This is slow. In the end I'm getting a matrix with doubles that I can manipulate quickly.
There must be a faster method to extract this data and I don't know it. Perhaps you can help?

2 comentarios

dpb
dpb el 4 de Jul. de 2018
What's the data encoding scheme?
Paolo
Paolo el 4 de Jul. de 2018
What does your regexp look like? Perhaps it could be improved for efficiency.

Iniciar sesión para comentar.

 Respuesta aceptada

Guillaume
Guillaume el 4 de Jul. de 2018
Editada: Guillaume el 4 de Jul. de 2018
How about:
filecontent = fileread(yourfile); %read everything at once
hexstrings = regexp(filecontent, '(?<=Value: \[)[^\]]*(?=\])', 'match');
decvalues = cell2mat(cellfun(@(hex) sscanf(hex, '%x-'), hexstrings, 'UniformOutput', false))';
This may be faster:
filecontent = fileread(yourfile); %read everything at once
hexstrings = regexp(filecontent, '(?<=Value: \[)[^\]]*(?=\])', 'match');
decvalues = reshape(sscanf(strjoin(hexstrings, '-'), '%x-'), [], numel(hexstrings))';
Unfortunately, textscan doesn't have a '%x' format specifier.

2 comentarios

dpb
dpb el 4 de Jul. de 2018
A major foobar, too... :( I added the enhancement request not long after it was introduced; like many simple but apparently not sexy-enough things, it's never made the cut of what gets attention.
Guillaume, brilliant! So far it looks much faster than what I previously coded. Good to learn. Thanks.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Productos

Versión

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by