# How to store sequence of bits as a bit stream and use the least possible memory ?

39 views (last 30 days)
Teo Protoulis on 10 Jan 2020
Edited: Matt J on 10 Jan 2020
I am implementing a huffman encoding procedure which for example encodes the decimal number -5 as '100010'. Until now I have implemented some code which correctly produces the encoded sequence. The encoded sequence is stored as a string. However, I want the bit sequence to be stored as a bit stream and use only the neccesary amount of space for storage. For example, I want the encoded sequence for -5, which is '100010', to take 6 bits of memory. Is that possible ?
Guillaume on 10 Jan 2020
Note that in all modern processors and certainly all that run matlab (x86/x64) the minimum unit of memory is 8 bits and everything is in multiple of 8.

Matt J on 10 Jan 2020
Edited: Matt J on 10 Jan 2020
You could chain all your strings together as a logical column vector and use bwpack, see
Example:
>> A='10001010001010001011010100011000100010100010';
>> B=bwpack((A-'0').');
>> whos A B
Name Size Bytes Class Attributes
A 1x44 88 char
B 2x1 8 uint32

Stephane Dauvillier on 10 Jan 2020
You can use the functions:
1. dec2bin to tranform decimal number into binary,
2. string to transform the output of the dec2bin from char to string
3. regexprep to replace the first 0 by empty
4. strjoin to join everything
For instance:
strjoin(regexprep(string(dec2bin([66,8,19,7])),"^0+","","once"),"")

Guillaume on 10 Jan 2020
A bit similar to Matt's suggestion, but may be slightly more memory efficient since bwpack uses multiples of 32-bit, is to:
• split your stream into block of 8-bits
• pad the last block to 8 bits if shorter than 8.
• convert each block to uint8, the smallest unit of storage on x86/x64
eg.
bitstream = char(randi(double('01'), 1, 19)) %random stream of 19 0-1 characters
encoded = bitstream - '0'; %convert to binary vector. Uses 4 times as much memory as bitstream
encoded = [encoded, zeros(1, mod(-numel(bitstream), 8))]; %pad with zeros at the to multiple of 8
encoded = uint8(sum(reshape(encoded, 8, []) .* 2.^(7:-1:0)')) %convert each sequence of 8 bits into 8-bit integer. 1st bit is MSB
You can easily convert back to the (padded) bitstream:
padded_origstream = reshape(dec2bin(encoded, 8)', 1, [])