Borrar filtros
Borrar filtros

Reduce the Size of Matrix

2 visualizaciones (últimos 30 días)
Isay
Isay el 22 de Nov. de 2014
Comentada: Stephen23 el 1 de Dic. de 2014
I need to Reduce my Matrices(Xbool_last and Xfreq_last) , because in the 1000th step of loop(it means docx=1000) , Matlab said : out of Memory!(loop is from 1 to 5549 !! )
please look at the part of code with information in it:
%%The loop for exploring ALL the documents to create the tf-idf weight matrix
for docx = 1 : length(DBlast)
docx
for word = 1 : length(DBlast{docx})
% In docx , we search all words in docx
word_xi = DBlast{docx}{word,1} ;
for docy = 1 : length(DBlast)
% While the source words are from docx search for them in
% the rest of documents
% if word_1i found in document i(=doc) vote 1
if sum(strcmpi(DBlast{docy},word_xi)) ~= 0
ind = find(strcmpi(DBlast{docy},word_xi) ~= 0) ;
Xbool(word,docy) = 1 ;
Xfreq(word,docy) = Freqlast{docy}(ind) ;
else
% else vote 0
Xbool(word,docy) = 0 ;
Xfreq(word,docy) = 0 ;
end
end
end
Xbool_last = [Xbool_last;uint8(Xbool)];
Xfreq_last = [Xfreq_last;uint8(Xfreq)];
Xbool = [] ;
Xfreq = [] ;
end
===============================================================================
So, questions: 1- how can i Reduce the size of Xbool_last and Xfreq_last? if i need to export Matrices TO .txt file (or something else) for Using it , How can I save them? or load them?
can you say the recommended code?
2. How can I use, the output of above code in tf-idf algorithm?(if you konw),
the tf-idf code is attached

Respuesta aceptada

Guillaume
Guillaume el 22 de Nov. de 2014
Editada: Guillaume el 22 de Nov. de 2014
You're already using uint8 to store your values. There isn't a smaller type unless you start packing booleans into bits which I assume is not possible for Xfreq_last anwyay. Using bits to store boolean is bound to be slow in matlab and awkward in matlab. There's no built-in function for that.
However, your storage looks incredibly inefficient to me. Say you're processing the first word of the first document. You find it in documents 2, 10, 150, 2048, 4125 for example. For a start, instead of storing those values (which woudln't take much memory, ~20 bytes as uint32), you store a boolean array of size 1x5549 (~5549 bytes) with only a few ones. But more importantly, in document 2, you're going to be looking for the exact same word, which you'll find in the exact same documents and store that again. Why?
Why not do the storage per word, instead of document, and for each word, just store which document it's found in?
  1 comentario
Stephen23
Stephen23 el 1 de Dic. de 2014
+1 for excellent advice of data management.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Get Started with MATLAB en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by