I have a text file that I would like to split into an array. Each array cell should be a word, not a sentence or line in the file.
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
This is what I got so far. But it does not actually solve my problem.
file= fopen('marktwain.txt','r');
string= fread(file, [1, inf], 'char');
fclose(file);
CStr = dataread('file', 'marktwain.txt', '%s', 'delimiter', '\n');
I have little clue where to go from here.
0 comentarios
Respuesta aceptada
Cedric
el 17 de Mzo. de 2013
Editada: Cedric
el 17 de Mzo. de 2013
buffer = fileread('marktwain.txt') ;
words = regexp(buffer, '\<\w+', 'match') ;
.. and we can discuss the pattern if you want to refine the regexp. You could for example have "it's" or "John's" count as single words (and not two) using (EDITED)
words = regexp(buffer, '\<[\w'']+', 'match') ;
The final answer, after the discussion below, is:
buffer = fileread('marktwain.txt') ;
words = regexp(buffer, '\<[\w''\-,]+', 'match') ;
8 comentarios
Cedric
el 17 de Mzo. de 2013
Editada: Cedric
el 17 de Mzo. de 2013
You want the comma to be part of words? If so, you probably figured out now that you can match it with
words = regexp(buffer, '\<[\w'',-]+', 'match') ;
Note that the dash has a special meaning when followed by a literal (it codes a range, like in A-Z that means A to Z), so you have to escape it if it doesn't come last within the []:
words = regexp(buffer, '\<[\w''\-,]+', 'match') ;
This is why I put the comma before the dash in the first expression.
Más respuestas (2)
Walter Roberson
el 17 de Mzo. de 2013
file = fopen('marktwain.txt', 'rt');
CStr = textscan(file, '%s');
fclose(file);
Only problem: you have not defined exactly what a "word" is for your purposes, so the above is going to break things up at whitespace.
Image Analyst
el 17 de Mzo. de 2013
Editada: Image Analyst
el 17 de Mzo. de 2013
For example:
>> allwords('This is what I got so far. But it does not actually solve my problem.')
ans =
'This' 'is' 'what' 'I' 'got' 'so' 'far' 'But' 'it' 'does' 'not' 'actually' 'solve' 'my' 'problem'
2 comentarios
Walter Roberson
el 17 de Mzo. de 2013
Yes you would have to download it from the link that was given.
Ver también
Categorías
Más información sobre Characters and Strings en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!