Incomplete reading of MS Word file
3 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
At work I have to read some VERY long Word documents (~300 pages) and analyze the text. However, if I use the commands suggested in https://fr.mathworks.com/matlabcentral/answers/348737-how-to-read-ms-word-file-doc-docx :
word = actxserver('Word.Application');
wdoc = word.Documents.Open(filePath);
text = wdoc.Content.text;
wdoc.Close; % close document
word.Quit; % end application
the resulting "text" variable (1x158745 char) only contains ~25% of the document.
How can I read the whole document using this method? I saw that on newer relaseses there are dedicated functions/toolboxes for reading Word documents, but I don't have access to them as my company only provides R2020b and limited toolboxes.
0 comentarios
Respuestas (1)
Oguz Kaan Hancioglu
el 12 de Abr. de 2023
I haven't tried for such a huge file but can you try the open word document with fopen and read the whole text using read(fid, '*char'). Maybe it will work.
1 comentario
Walter Roberson
el 12 de Abr. de 2023
That will not work in the form stated. .docx files are zip files that contain a directory of mostly XML files.
You can unzip the .docx file and go through the directory and try to extract things from the XML files; the XML files would be text files.
Ver también
Categorías
Más información sobre Text Files en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!