Fast binary decoding of very large matrix of uint8s

1 visualización (últimos 30 días)
dymitr ruta
dymitr ruta el 17 de Oct. de 2022
Editada: dymitr ruta el 18 de Oct. de 2022
Hi Guys, I will appreciate help on the very fast decoding of uint8s into 4 uint8s corresponding to each 2-bit representation of original 8-bit uint8. So say for the input x=uint8(141), the output should be y=[2 0 3 1] because 141 in binary '10001110' gets decomposed into 4 2-bin values: 10 00 11 10, i.e. 2 0 3 1. The thing is I have input matrices of z typically of the order 1e8 by 38, which should get decomposed into uint8 matrix of size 1e8 by (4x38). Any conversion to double would flood the RAM and break the code. I tried using idivide given this constraints and the best code so far is this:
for i=1:4
y(:,:,i)=idivide(x, uint8(4)^(4-i));
x=x-y(:,:,i)*4^(4-i);
end;
Any faster ideas? A good test could be:
x=uint8(randi(256,1e7,40)-1);
  2 comentarios
James Tursa
James Tursa el 17 de Oct. de 2022
Editada: James Tursa el 17 de Oct. de 2022
Your original memory footprint (GB):
1e8*38/(1024*1024*1024)
ans = 3.5390
The new memory footprint (GB):
ans*4
ans = 14.1561
So just storing these will cost you about 18GB. Manipulating them downstream in your code will cost more as well, so you will need to be sure you have plenty of RAM available. And working with variables this size is going to be slow, so you will need to accept that. Do you really need everything in RAM all at once? What are you doing with the split-up array downstream in your code? Maybe there is a way to accomplish your objectives without creating and carrying this large array in memory.
Having said all this, if you really need this done as stated and carry everything in memory all at once, probably the fastest way to convert this will be with a multi-threaded mex routine for the bit-level stuff. Do you have a supported C/C++ compiler installed? The code would be fairly trivial to write.
dymitr ruta
dymitr ruta el 17 de Oct. de 2022
Thx for it James, yes it is expected expansion from the single char encoding of the 4 DNA nucleotide possible letters of ACGT. If such encoding and decoding is fast enough I can work with 4x bigger seq in RAM and only decode if needed. Also loads faster since u have to pull 4x less uint8s from ssd in the encoded state

Iniciar sesión para comentar.

Respuesta aceptada

Rik
Rik el 17 de Oct. de 2022
It might not be the absolute fastest method, but this should be about 3 times faster
x=uint8(randi(256,1e7,40)-1);
tic
y1 = ReferenceSolution(x);
toc
Elapsed time is 14.318841 seconds.
tic
y2 = SuggestedSolution(x);
toc
Elapsed time is 3.184643 seconds.
isequal(y1,y2)
ans = logical
1
function y=SuggestedSolution(x)
y=x*0;
for k=4:-1:2
y(:,:,k) = mod(x,4);
x = bitshift(x-y(:,:,k),-2);
end
y(:,:,1) = mod(x,4);
if nargin==0,clear,end
end
function y=ReferenceSolution(x)
y=x*0;
for i=1:4
y(:,:,i)=idivide(x, uint8(4)^(4-i));
x=x-y(:,:,i)*4^(4-i);
end
if nargin==0,clear,end
end
  2 comentarios
dymitr ruta
dymitr ruta el 17 de Oct. de 2022
Thx Rik, good improvement, will remember to faster decomposition from the back
dymitr ruta
dymitr ruta el 18 de Oct. de 2022
Editada: dymitr ruta el 18 de Oct. de 2022
Actually your solution is the fastest so I accepted as an answer, here is the current fastest code based on your idea above, many thanks:
function y=dnd(x)
[n,m]=size(x); y=zeros(n,m,4,'uint8');
for i=4:-1:2
y(:,:,i)=mod(x,4);
x = bitshift(x-y(:,:,i),-2);
end
y(:,:,1) = mod(x,4);
y=reshape(permute(y,[1 3 2]),n,[]);

Iniciar sesión para comentar.

Más respuestas (1)

Walter Roberson
Walter Roberson el 17 de Oct. de 2022
I posted timing tests on various conversion options at one point.
build a look table in which each column is 4 uint8.
Take the input, uint16 it, add one, use it as the column indices, getting row results, 4 x 38e8. reshape and permute() to get the 3d array.
The reason for uint16 and add 1 is to map 0 to 255 to valid indices 1 to 256. If you were to just add 1 then input 255+1 would saturate.
  1 comentario
dymitr ruta
dymitr ruta el 18 de Oct. de 2022
Editada: dymitr ruta el 18 de Oct. de 2022
Hi Walter, even though this feels like the best solution unless I am doing sth wrong, it appears to be 2x slower than Rik's:
function y=dnd(x)
%Build lookup table y of all combinations with reps of (0:3)' - fast
[y{4:-1:1}]=ndgrid(uint8([0:3])); y=reshape(cat(4,y{:}),[],4)';
%Remember size of x and convert it to an index - fast
[n,m]=size(x); x=reshape(x',[],1);
%Inc by 1 to start from 1 and end on 256 - fast
x=uint16(x)+1;
%Build raw matrix y elements - slow >90% of time
y=y(:,x); %This line takes over 90% of time
%Reshape to the required size - medium 5-10% of time
y=reshape(y,4*m,[])';
Takes over 7sec on the x=uint8(randi(256,1e7,40)-1) input, Riks solution:
function y=dnd(x)
[n,m]=size(x); y=zeros(n,m,4,'uint8');
for i=4:-1:2
y(:,:,i)=mod(x,4);
x = bitshift(x-y(:,:,i),-2);
end
y(:,:,1) = mod(x,4);
y=reshape(permute(y,[1 3 2]),n,[]);
takes <3sec

Iniciar sesión para comentar.

Categorías

Más información sobre Programming en Help Center y File Exchange.

Productos


Versión

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by