Fast binary decoding of very large matrix of uint8s
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
dymitr ruta
el 17 de Oct. de 2022
Editada: dymitr ruta
el 18 de Oct. de 2022
Hi Guys, I will appreciate help on the very fast decoding of uint8s into 4 uint8s corresponding to each 2-bit representation of original 8-bit uint8. So say for the input x=uint8(141), the output should be y=[2 0 3 1] because 141 in binary '10001110' gets decomposed into 4 2-bin values: 10 00 11 10, i.e. 2 0 3 1. The thing is I have input matrices of z typically of the order 1e8 by 38, which should get decomposed into uint8 matrix of size 1e8 by (4x38). Any conversion to double would flood the RAM and break the code. I tried using idivide given this constraints and the best code so far is this:
for i=1:4
y(:,:,i)=idivide(x, uint8(4)^(4-i));
x=x-y(:,:,i)*4^(4-i);
end;
Any faster ideas? A good test could be:
x=uint8(randi(256,1e7,40)-1);
2 comentarios
James Tursa
el 17 de Oct. de 2022
Editada: James Tursa
el 17 de Oct. de 2022
Your original memory footprint (GB):
1e8*38/(1024*1024*1024)
The new memory footprint (GB):
ans*4
So just storing these will cost you about 18GB. Manipulating them downstream in your code will cost more as well, so you will need to be sure you have plenty of RAM available. And working with variables this size is going to be slow, so you will need to accept that. Do you really need everything in RAM all at once? What are you doing with the split-up array downstream in your code? Maybe there is a way to accomplish your objectives without creating and carrying this large array in memory.
Having said all this, if you really need this done as stated and carry everything in memory all at once, probably the fastest way to convert this will be with a multi-threaded mex routine for the bit-level stuff. Do you have a supported C/C++ compiler installed? The code would be fairly trivial to write.
Respuesta aceptada
Rik
el 17 de Oct. de 2022
It might not be the absolute fastest method, but this should be about 3 times faster
x=uint8(randi(256,1e7,40)-1);
tic
y1 = ReferenceSolution(x);
toc
tic
y2 = SuggestedSolution(x);
toc
isequal(y1,y2)
function y=SuggestedSolution(x)
y=x*0;
for k=4:-1:2
y(:,:,k) = mod(x,4);
x = bitshift(x-y(:,:,k),-2);
end
y(:,:,1) = mod(x,4);
if nargin==0,clear,end
end
function y=ReferenceSolution(x)
y=x*0;
for i=1:4
y(:,:,i)=idivide(x, uint8(4)^(4-i));
x=x-y(:,:,i)*4^(4-i);
end
if nargin==0,clear,end
end
2 comentarios
Más respuestas (1)
Walter Roberson
el 17 de Oct. de 2022
I posted timing tests on various conversion options at one point.
build a look table in which each column is 4 uint8.
Take the input, uint16 it, add one, use it as the column indices, getting row results, 4 x 38e8. reshape and permute() to get the 3d array.
The reason for uint16 and add 1 is to map 0 to 255 to valid indices 1 to 256. If you were to just add 1 then input 255+1 would saturate.
1 comentario
Ver también
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!