Problem with using readtable:
Mostrar comentarios más antiguos
Here is the code:
opts = detectImportOptions("C:\Users\Onat\Desktop\392\vocab.txt");
opts.VariableTypes=["string", "double"];
opts.LineEnding = ["\n"];
vocab = readtable('C:\Users\Onat\Desktop\392\vocab.txt',opts);
I'am working on an NLP application in which I need the vocabulary and frequency of those words in vocabulary. Naturally, the corpus contains tokens such as single apostrophe. It seems that this is a major problem for MATLAB since it detects it as a special char. Notice that in the output given below, after apostrophe the frequencies are seen as comments to MATLAB. Can anyone help with this issue?
vocab.txt is as follows:
..... (i.e. this is not the beginning)
and 699333
in 603607
" 538122
to 504540
a 476836
was 304423
...... (i.e continues)
the output is as follows:
...... (i.e. this is not the beginning)
"and" 6.9933e+05
"in" 6.0361e+05
" 538122↵to 504540↵a 476836↵was 304423↵The 246510↵- 229901↵is 225721↵for 198733↵)
Respuesta aceptada
Más respuestas (1)
I would take a different approach. I would use readlines and string manipulation to create the table.
str = readlines("vocab.txt")
T = array2table(split(str),'VariableNames',["Word","Freq"]);
T.Freq = str2double(T.Freq)
Categorías
Más información sobre Text Data Preparation en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!