tf weighting in docs

John

2 Dic. 2017

0 Respuestas

12 Visualizaciones (30 días)

Iniciar sesión para responder a esta pregunta.

Follow Question

Iniciar sesión para responder a esta pregunta.

Follow Question

Mostrar comentarios más antiguos

Abrir en MATLAB Online

0 votos

How do I evaluate term frequency (how many times each term occurs in a document) from a notepad having multiple documents, started by a document ID <P ID=xxx> and separated by delimiters </P>. I need to distinguish the statistics for each document.
I have been able to load the text, but my regular approach of identifying document ID won't work because the IDs are not contiguous, and as such, 'n' cannot be used to increment doc ID.
% The notepad file has been loaded into variable C
C = C{1}; 
fclose(fid);
idx = strfind(C,'</P>');
n = nnz(cellfun(@(x) ~isempty(x), idx));
fileName = ('DTags.txt');
fid = fopen(fileName,'w+');
for kk = 1:n
  str = ['<p id=',num2str(kk),'>'];
      fileName = ('DTags.txt');
      fid = fopen(fileName,'a+');
      fprintf(fid,'%s\r\n',str);
      fclose(fid);
end