How to extract part of a text file in MATLAB?
6 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Okay so I have opened an xml file and want to get the relevant text stored in those files. I tried the following code (noting that the relevant text started after a certain string of characters in the xml file, I tried to use an if statement to extract the text from that point till they reached another point. This would give me less meaningless text so that I could get the text that I want.)
if true
File1 = fopen('Factual1.xml','r');
File2 = fopen('Factual2.xml','r');
File3 = fopen('Colloquial1.xml','r');
File4 = fopen('Colloquial2.xml','r');
File5 = fopen('Hello.xml','r');
File6 = fopen('Hello2.xml','r');
Filenames = {'File1';'File2';'File3';'File4';'File5';'File6'};
B = {0};
for i=File1:File6
A = fscanf(i,'%s');
if ~(strcmp(A,'<w:pw:rsidR="00E3286E"w:rsidRDefault="'))
while((B = fscanf(i,'%c')) ~='\')
B
end
end
end
end
but I keep getting an error, saying that the statement B = fscanf(I,'%c') is not valid. Is there any other way that I can scan the contents of each file, character by character, so that I can extract the amount of text that I want?
0 comentarios
Respuestas (2)
Ken Atwell
el 3 de Jun. de 2013
I'm guessing you're a C programmer. You can't assign B in the while loop's conditional like you are attempting to do. Use two lines:
B = fscanf(i, '%c');
while B ~= '\'
...
B = fscanf(i, '%c');
end
BTW, I believe your for loop is working "accidentally" because MATLAB tends to assign file handles in numeric order -- but is perhaps not guaranteed.
4 comentarios
Walter Roberson
el 4 de Jun. de 2013
MATLAB appears to follow what POSIX does, which is to allocate the first available (lowest numbered) file descriptor. But that does not mean that the results will always be consecutive.
fid1 = fopen('file1');
fid2 = fopen('file2');
fid3 = fopen('file3');
fclose(fid1);
fclose(fid2);
nfid1 = fopen('nfile1');
nfid2 = fopen('nfile2');
nfid3 = fopen('nfile3');
If we assume nothing had been opened before, fid1 will be 3, fid2 will be 4, fid3 will be 5, then 3 and 4 are released, so nfid1 will be 3, nfid2 will be 4, but nfid3 would be the next available, 6, rather than the consecutive 5.
Paul Metcalf
el 4 de Jun. de 2013
You are defining B as a cell matrix, then trying to replace B with a different data type which is invalid. Try first initializing B properly. E.g. B = cell(m,n); Then to assign data into each cell in the array use B{1,1} = 'first line of data'; etc... Your code is really poorly constructed in general. If I have time tonight I'll look at sending you some more tips.
Ver también
Categorías
Más información sobre Text Data Preparation en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!